{"id":894,"date":"2026-05-22T12:40:48","date_gmt":"2026-05-22T12:40:48","guid":{"rendered":"https:\/\/ont.io\/news\/?p=894"},"modified":"2026-05-22T12:40:50","modified_gmt":"2026-05-22T12:40:50","slug":"portable-reputation-ai-evaluation","status":"publish","type":"post","link":"https:\/\/ont.io\/news\/portable-reputation-ai-evaluation\/","title":{"rendered":"Reputation as public infrastructure"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><span lang=\"EN-US\"><em>The supply of trusted AI evaluators is bottlenecked not by a shortage of humans but by platform-bound reputation that resets every time an evaluator moves between vendors. Portable reputation, built on W3C Decentralized Identifiers and Verifiable Credentials, is the infrastructure that compresses calibration time and makes the trusted evaluator market elastic.<\/em><\/span><p class=\"MsoBodyText\" style=\"margin: 9pt 0cm; font-size: medium; font-family: Cambria, serif; white-space: normal;\"><span lang=\"EN-US\">The list of flagship open models released in a single recent month is the kind of artefact that until quite recently would have been unthinkable. Each release lands with detailed technical reports, public weights, and a growing community of teams that want to fine-tune the model for their use case before deploying it. Each fine-tune wants preference data. Each preference dataset wants high-quality human evaluators. The volume of evaluator capacity required to support this rate of release is not slightly higher than the supply. It is much higher.This is the loudest version of a problem the AI evaluation industry has been quietly nursing for years. The supply of trusted evaluators is bottlenecked by the time it takes to onboard, calibrate, and quality-rate any individual evaluator. Every platform reinvents this onboarding from scratch. Every team that switches platforms reinvents it again. The collective effort spent rebuilding the same evaluator\u2019s reputation across vendors is staggering, and it scales linearly with the number of platforms, not the size of the evaluator pool.The bottleneck is not a shortage of humans. It is a missing piece of infrastructure: portable reputation. This piece extends the argument\u00a0<a href=\"https:\/\/ont.io\/news\/selective-disclosure-ai-evaluation\/\" data-type=\"link\" data-id=\"https:\/\/ont.io\/news\/selective-disclosure-ai-evaluation\/\">Thursday\u2019s article on selective disclosure<\/a>\u00a0opened, by treating evaluator reputation itself as a verifiable credential.<\/span><\/p><h2 style=\"margin: 10pt 0cm 0cm; break-after: avoid; font-size: 14pt; font-family: Calibri, sans-serif; color: rgb(79, 129, 189); white-space: normal;\"><a name=\"what-portable-means-in-practice\"><span lang=\"EN-US\">What \u201cportable\u201d means in practice<\/span><\/a><span lang=\"EN-US\"><\/span><\/h2><p class=\"FirstParagraph\" style=\"margin: 9pt 0cm; font-size: medium; font-family: Cambria, serif; white-space: normal;\"><span lang=\"EN-US\">Imagine an evaluator who has been quality-rated on Platform A for two years. They have a history of inter-rater agreement above platform median. They have completed thousands of preference comparisons. They have specialist credentials in code review and clinical text. Platform A\u2019s reputation system knows all of this. When the evaluator moves to Platform B, Platform B knows none of it. Platform B starts the evaluator on the same onboarding flow as a first-time user. The evaluator\u2019s history is opaque to Platform B. Platform B has no way to verify the claims even if the evaluator could export them.The result is wasted weeks. Platform B re-calibrates an evaluator who was already calibrated. Platform A loses an evaluator whose accumulated reputation has no portability. The evaluator, who has done the work, gains nothing from having done it once they are no longer on Platform A.Portable reputation flips this. The evaluator\u2019s reputation lives in a credential they hold, not in a database Platform A controls. Platform B verifies the credential, computes how much it should trust the issuing platform\u2019s rating methodology, and decides how much of the calibration step to skip. The evaluator\u2019s history of work follows them. Platform A no longer owns the evaluator\u2019s professional record. Platform B no longer has to cold-start every new arrival.This is not science fiction. It is the obvious application of Verifiable Credentials to a labour market that desperately needs them.<\/span><\/p><h2 style=\"margin: 10pt 0cm 0cm; break-after: avoid; font-size: 14pt; font-family: Calibri, sans-serif; color: rgb(79, 129, 189); white-space: normal;\"><a name=\"web3-already-solved-this-problem\"><span lang=\"EN-US\">Web3 already solved this problem<\/span><\/a><span lang=\"EN-US\"><\/span><\/h2><p class=\"FirstParagraph\" style=\"margin: 9pt 0cm; font-size: medium; font-family: Cambria, serif; white-space: normal;\"><span lang=\"EN-US\">The portability problem has been a foundational concern in Web3 since well before AI evaluation became the loudest market that needed it. Decentralised identity, credential standards, and on-chain attestations have all matured around the basic insight that reputation is a property of a person, not a property of the platform that observed them. The technical building blocks are stable:\u00a0<a style=\"color: purple;\" href=\"https:\/\/www.w3.org\/TR\/did-1.1\/\" target=\"_blank\" rel=\"noopener\">W3C Decentralized Identifiers<\/a>\u00a0anchor a portable identifier the holder controls;\u00a0<a style=\"color: purple;\" href=\"https:\/\/www.w3.org\/TR\/vc-data-model-2.0\/\" target=\"_blank\" rel=\"noopener\">W3C Verifiable Credentials<\/a>\u00a0provide signed attestations from any trusted issuer; the\u00a0<a style=\"color: purple;\" href=\"https:\/\/www.w3.org\/TR\/vc-bitstring-status-list\/\" target=\"_blank\" rel=\"noopener\">W3C Bitstring Status List specification<\/a>\u00a0allows issuers to revoke credentials cleanly when they become invalid. The broader\u00a0<a style=\"color: purple;\" href=\"https:\/\/identity.foundation\/\" target=\"_blank\" rel=\"noopener\">Decentralized Identity Foundation<\/a>\u00a0ecosystem has been stewarding this stack for nearly a decade.What has been missing is a market that obviously and urgently needs them. AI evaluation is becoming that market. The conditions are precisely the ones the standards were designed for: many issuers (eval platforms, certification bodies, employer references, peer endorsements), many verifiers (downstream eval platforms, AI labs, research teams), many holders (the evaluators themselves), and a strong economic incentive for portability because every platform switch today destroys real value.The transition is not a question of inventing new infrastructure. It is a question of deploying mature infrastructure into a market that has finally noticed it needs the primitive.<\/span><\/p><h2 style=\"margin: 10pt 0cm 0cm; break-after: avoid; font-size: 14pt; font-family: Calibri, sans-serif; color: rgb(79, 129, 189); white-space: normal;\"><a name=\"Xb77ff26da39e1b64561920926cc6d62ff47c45e\"><span lang=\"EN-US\">What it changes for AI evaluation specifically<\/span><\/a><span lang=\"EN-US\"><\/span><\/h2><p class=\"FirstParagraph\" style=\"margin: 9pt 0cm; font-size: medium; font-family: Cambria, serif; white-space: normal;\"><span lang=\"EN-US\">When evaluator reputation becomes portable, three things shift.First, the supply of trusted evaluators becomes effectively elastic. Onboarding is no longer the bottleneck. An evaluator who has been calibrated once can show up on a new platform with credentials that compress weeks of calibration into a single verification step.Second, the market for evaluator quality becomes competitive in a new dimension. Eval platforms must compete on the quality of their rating methodology, not on lock-in. If Platform A\u2019s quality ratings are widely trusted by downstream consumers, Platform A becomes a sought-after issuer. If Platform B\u2019s ratings are not trusted, Platform B loses standing regardless of its other features.Third, the cost of switching collapses. Teams that need to migrate eval work between vendors stop paying the cold-start tax. Vendors stop being able to hold evaluator history hostage. The whole market becomes more honest about what platforms actually contribute, because the long tail of inertia-based platform lock-in disappears.This is what reputation as public infrastructure looks like. It is not a single platform. It is a set of standards that any platform can issue, hold, and verify against, with the evaluator as the durable anchor.<\/span><\/p><h2 style=\"margin: 10pt 0cm 0cm; break-after: avoid; font-size: 14pt; font-family: Calibri, sans-serif; color: rgb(79, 129, 189); white-space: normal;\"><a name=\"where-ontology-fits\"><span lang=\"EN-US\">Where Ontology fits<\/span><\/a><span lang=\"EN-US\"><\/span><\/h2><p class=\"FirstParagraph\" style=\"margin: 9pt 0cm; font-size: medium; font-family: Cambria, serif; white-space: normal;\"><span lang=\"EN-US\">Ontology has been building reputation primitives on top of decentralised identity since the platform launched.\u00a0<a style=\"color: purple;\" href=\"applewebdata:\/\/7A2811B0-C77D-44F7-9BD0-06124B2C7037\/%7Bont.io\/ont-id%7D\">ONT ID<\/a>\u00a0issues credentials that hold across systems.\u00a0<a href=\"https:\/\/onto.app\" target=\"_blank\" rel=\"noopener\">ONTO Wallet<\/a>\u00a0gives the holder direct custody of those credentials. The infrastructure is designed to support exactly the pattern that AI evaluation now needs: many issuers, many verifiers, one durable holder, portability as the default not the exception.The teams building the next generation of AI evaluation supply will either rebuild this primitive badly, or adopt the mature standards-based primitive that already exists. The economic pressure is on the side of adoption. The release calendar is doing the persuading.<\/span><\/p><div class=\"MsoNormal\" align=\"center\" style=\"margin: 0cm 0cm 10pt; font-family: Cambria, serif; white-space: normal; text-align: center;\"><span lang=\"EN-US\"><hr size=\"0\" width=\"100%\" align=\"center\"><\/hr><\/span><a name=\"continue-reading-this-week\"><span lang=\"EN-US\">Continue reading this week<\/span><\/a><span lang=\"EN-US\"><\/span><\/div><p class=\"FirstParagraph\" style=\"margin: 9pt 0cm; font-size: medium; font-family: Cambria, serif; white-space: normal;\"><span lang=\"EN-US\">Tomorrow:\u00a0Signed content for a world where platforms are AI, moving the argument from reputation to content provenance.<\/span><\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The supply of trusted AI evaluators is bottlenecked not by a shortage of humans but by platform-bound reputation that resets every time an evaluator moves between vendors. Portable reputation, built on W3C Decentralized Identifiers and Verifiable Credentials, is the infrastructure that compresses calibration time and makes the trusted evaluator market elastic. The list of flagship<\/p>\n","protected":false},"author":5,"featured_media":895,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[170,113,13],"tags":[67,73,117,172,177],"class_list":["post-894","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-data","category-did-and-privacy","tag-ont-id","tag-portable-reputation","tag-decentralised-identity","tag-ai-evaluation","tag-rlhf"],"_links":{"self":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts\/894","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/comments?post=894"}],"version-history":[{"count":2,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts\/894\/revisions"}],"predecessor-version":[{"id":897,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts\/894\/revisions\/897"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/media\/895"}],"wp:attachment":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/media?parent=894"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/categories?post=894"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/tags?post=894"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}