Selective Disclosure: Privacy Primitive for AI Evaluation

AI safety evaluation needs verified demographic and expertise range in its evaluator pools, but evaluators cannot reasonably be asked to surrender biographical data. Selective disclosure, a W3C Verifiable Credentials primitive (with IETF SD-JWT as the leading implementation), lets evaluators prove a specific attribute without revealing the rest of their credential.

AI safety evaluation has a demographic problem. Models behave differently for different populations. Refusal layers behave differently when prompts arrive in different dialects. Reward models trained on one cohort generalise poorly to another. Bias measurements that ignore evaluator composition produce numbers that look rigorous and answer the wrong question. The community is increasingly aware of this. The methodological answer is to ensure evaluator pools cover the demographic, regional, and expertise range that the model’s deployment will encounter.

The implementation problem is that almost nobody knows how to do this without crossing the line into surveillance.

If the team wants to know that 15% of its safety evaluators speak African American Vernacular English at native fluency, or that the panel includes at least three medical specialists, or that the regional distribution roughly mirrors a target market, the obvious move is to ask. The evaluator self-reports. The platform stores the answer. The team gets a dashboard.

The obvious move is also the wrong one. Self-report is fragile, the platform now holds sensitive demographic data, and the evaluator has to trust both the platform and every downstream consumer of that data to use it responsibly forever. Most evaluators, sensibly, will not. The ones who do are not a representative sample of the population the team is trying to evaluate.

There is a better primitive. It has been a W3C Recommendation since May 2025, it underlies the entire Verifiable Credentials Data Model 2.0 family, and it is almost never invoked in AI evaluation contexts. It is called selective disclosure. This piece picks up from yesterday’s argument on proof of personhood without surveillance, and takes the same primitive into the deeper case of attribute verification.

What selective disclosure does

A verifiable credential, as defined in the W3C Verifiable Credentials Data Model 2.0, is a cryptographically signed statement made by an issuer about a subject. The credential might contain many attributes. A medical licensing body issues a credential that includes the licensee’s full name, jurisdiction, specialty, board status, expiry date, and licence number. A government issues a credential that includes legal name, date of birth, address, nationality, and biometric template. A linguistics certification body issues a credential that includes the holder’s certified dialect proficiencies, the dates of certification, and the assessor.

The credential, full and unredacted, is held by the subject. It is stored on their device, in a wallet they control. The issuer does not need to be contacted to use it. The credential does not need to be sent to the verifier.

When a verifier needs to know one specific attribute, the holder computes a cryptographic proof that the credential exists, was issued by a trusted issuer, has not been revoked, and contains the asserted attribute. The verifier confirms the proof. The verifier learns one fact: this person holds a credential from a trusted issuer that asserts the specified attribute. Nothing else. Not the rest of the credential. Not the holder’s other attributes. Not any data that can be used to re-identify the holder elsewhere.

The leading implementation of this in production identity systems is IETF RFC 9901, Selective Disclosure for JWTs, better known as SD-JWT. It specifies exactly how to encode a credential so that the holder can present a subset of its claims to a verifier while keeping the rest hidden.

This is the primitive AI evaluation needs.

What it changes about safety evaluation

Apply it to the demographic problem. A team wants to construct a safety evaluation cohort with verified dialect range. Today, the only options are self-report (unreliable, demographic-leak prone) or demographic surveillance (intrusive, drives away the evaluators most likely to provide range). With selective disclosure, the team’s eval platform requires a credential from a recognised linguistic certification issuer asserting the relevant dialect proficiency. The evaluator presents a proof. The platform learns “this evaluator holds a verified dialect proficiency credential of type X.” It does not learn the evaluator’s name, location, age, or anything else about the credential.

The team gets verified composition. The evaluator gives up no sensitive data. The platform stores no demographic profile.

Now apply it to expertise. A safety eval needs medical specialists. Without selective disclosure, the platform asks the evaluator to upload their licence and trust the platform to handle it. With selective disclosure, the licensing body issues credentials, the evaluator presents proof of board-certified specialty, and the platform learns the specialty without learning the licence number, the issuing jurisdiction, or the evaluator’s address.

Now apply it to lived experience credentials, regional demographic credentials, age verification, professional certifications, language fluency, or any other axis where the team needs verified range but the evaluator cannot reasonably be asked to surrender biographical data. The pattern is identical. The architecture is one primitive.

Why it is not already deployed

The architecture is mature. The standards are published. The implementations exist. The barrier is not technical. It is that AI evaluation platforms grew up in the surveillance-default era of identity, and their entire stack assumes platform-bound, self-reported, attribute-disclosed-on-collection user data. Selective disclosure inverts that. The data lives with the holder. The platform learns only what the holder reveals. The platform’s business model often depends on holding more than that.

The teams that need rigorous evaluator composition are also the teams that should be most uncomfortable with the surveillance default. Safety-conscious AI labs, regulated industries, anyone working in healthcare or legal or financial contexts. These are the exact teams whose downstream users are most likely to refuse evaluation work if their participation requires permanent biographical exposure.

The clean way through is to require verifiable credentials with selective disclosure. The team gets composition assurance backed by cryptography. The evaluator gets to participate without giving up data they would otherwise refuse to surrender. The platform becomes a thinner trust layer rather than a fat one.

Where Ontology fits

ONT ID supports the W3C Verifiable Credentials Data Model 2.0 family, including the selective-disclosure mechanisms required for this pattern to work in production. Issuers can attest demographic, professional, dialect, regional, and experience credentials. Holders can store and present them with surgical control over which attributes any given verifier learns through the ONTO Wallet. Verifiers can validate the proofs without contacting the issuer or seeing the underlying credential.

For AI evaluation platforms that want demographic and expertise range without surveillance, this is the architectural shape of the answer. The work to integrate it is plumbing, not research. The harder question is whether the AI evaluation industry will choose the architecture that respects the evaluator, or the one that quietly does not.

Continue reading this week

Tomorrow: Reputation as public infrastructure, on why portable evaluator reputation is the next missing primitive.

Ontology News

Ontology News

Selective disclosure is the privacy primitive AI did not know it needed

What selective disclosure does

What it changes about safety evaluation

Why it is not already deployed

Where Ontology fits

Continue reading this week

Geoff R

Ontology News