The supply of trusted AI evaluators is bottlenecked not by a shortage of humans but by platform-bound reputation that resets every time an evaluator moves between vendors. Portable reputation, built on W3C Decentralized Identifiers and Verifiable Credentials, is the infrastructure that compresses calibration time and makes the trusted evaluator market elastic.
The list of flagship open models released in a single recent month is the kind of artefact that until quite recently would have been unthinkable. Each release lands with detailed technical reports, public weights, and a growing community of teams that want to fine-tune the model for their use case before deploying it. Each fine-tune wants preference data. Each preference dataset wants high-quality human evaluators. The volume of evaluator capacity required to support this rate of release is not slightly higher than the supply. It is much higher.This is the loudest version of a problem the AI evaluation industry has been quietly nursing for years. The supply of trusted evaluators is bottlenecked by the time it takes to onboard, calibrate, and quality-rate any individual evaluator. Every platform reinvents this onboarding from scratch. Every team that switches platforms reinvents it again. The collective effort spent rebuilding the same evaluator’s reputation across vendors is staggering, and it scales linearly with the number of platforms, not the size of the evaluator pool.The bottleneck is not a shortage of humans. It is a missing piece of infrastructure: portable reputation. This piece extends the argument Thursday’s article on selective disclosure opened, by treating evaluator reputation itself as a verifiable credential.
What “portable” means in practice
Imagine an evaluator who has been quality-rated on Platform A for two years. They have a history of inter-rater agreement above platform median. They have completed thousands of preference comparisons. They have specialist credentials in code review and clinical text. Platform A’s reputation system knows all of this. When the evaluator moves to Platform B, Platform B knows none of it. Platform B starts the evaluator on the same onboarding flow as a first-time user. The evaluator’s history is opaque to Platform B. Platform B has no way to verify the claims even if the evaluator could export them.The result is wasted weeks. Platform B re-calibrates an evaluator who was already calibrated. Platform A loses an evaluator whose accumulated reputation has no portability. The evaluator, who has done the work, gains nothing from having done it once they are no longer on Platform A.Portable reputation flips this. The evaluator’s reputation lives in a credential they hold, not in a database Platform A controls. Platform B verifies the credential, computes how much it should trust the issuing platform’s rating methodology, and decides how much of the calibration step to skip. The evaluator’s history of work follows them. Platform A no longer owns the evaluator’s professional record. Platform B no longer has to cold-start every new arrival.This is not science fiction. It is the obvious application of Verifiable Credentials to a labour market that desperately needs them.
Web3 already solved this problem
The portability problem has been a foundational concern in Web3 since well before AI evaluation became the loudest market that needed it. Decentralised identity, credential standards, and on-chain attestations have all matured around the basic insight that reputation is a property of a person, not a property of the platform that observed them. The technical building blocks are stable: W3C Decentralized Identifiers anchor a portable identifier the holder controls; W3C Verifiable Credentials provide signed attestations from any trusted issuer; the W3C Bitstring Status List specification allows issuers to revoke credentials cleanly when they become invalid. The broader Decentralized Identity Foundation ecosystem has been stewarding this stack for nearly a decade.What has been missing is a market that obviously and urgently needs them. AI evaluation is becoming that market. The conditions are precisely the ones the standards were designed for: many issuers (eval platforms, certification bodies, employer references, peer endorsements), many verifiers (downstream eval platforms, AI labs, research teams), many holders (the evaluators themselves), and a strong economic incentive for portability because every platform switch today destroys real value.The transition is not a question of inventing new infrastructure. It is a question of deploying mature infrastructure into a market that has finally noticed it needs the primitive.
What it changes for AI evaluation specifically
When evaluator reputation becomes portable, three things shift.First, the supply of trusted evaluators becomes effectively elastic. Onboarding is no longer the bottleneck. An evaluator who has been calibrated once can show up on a new platform with credentials that compress weeks of calibration into a single verification step.Second, the market for evaluator quality becomes competitive in a new dimension. Eval platforms must compete on the quality of their rating methodology, not on lock-in. If Platform A’s quality ratings are widely trusted by downstream consumers, Platform A becomes a sought-after issuer. If Platform B’s ratings are not trusted, Platform B loses standing regardless of its other features.Third, the cost of switching collapses. Teams that need to migrate eval work between vendors stop paying the cold-start tax. Vendors stop being able to hold evaluator history hostage. The whole market becomes more honest about what platforms actually contribute, because the long tail of inertia-based platform lock-in disappears.This is what reputation as public infrastructure looks like. It is not a single platform. It is a set of standards that any platform can issue, hold, and verify against, with the evaluator as the durable anchor.
Where Ontology fits
Ontology has been building reputation primitives on top of decentralised identity since the platform launched. ONT ID issues credentials that hold across systems. ONTO Wallet gives the holder direct custody of those credentials. The infrastructure is designed to support exactly the pattern that AI evaluation now needs: many issuers, many verifiers, one durable holder, portability as the default not the exception.The teams building the next generation of AI evaluation supply will either rebuild this primitive badly, or adopt the mature standards-based primitive that already exists. The economic pressure is on the side of adoption. The release calendar is doing the persuading.
Tomorrow: Signed content for a world where platforms are AI, moving the argument from reputation to content provenance.
