{"id":927,"date":"2026-06-22T16:13:35","date_gmt":"2026-06-22T16:13:35","guid":{"rendered":"https:\/\/ont.io\/news\/?p=927"},"modified":"2026-06-22T16:13:39","modified_gmt":"2026-06-22T16:13:39","slug":"human-oversight-documentation","status":"publish","type":"post","link":"https:\/\/ont.io\/news\/human-oversight-documentation\/","title":{"rendered":"When human oversight becomes a compliance requirement"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong>Human oversight documentation<\/strong>&nbsp;is the auditable record proving that real, identifiable people reviewed and shaped a model during training, and as recursive self-improvement moves up the regulatory agenda it is shifting from internal good practice toward a likely compliance requirement. What an auditor adds to the picture is provenance. It is no longer enough to assert that humans were in the loop. You have to show which humans, that they were distinct real people rather than sybils or one-shot contractors, that their judgement held up over time, and that every contribution they made is attributable, timestamped and tamper-evident. That is a different and harder thing than having had good intentions about oversight, and most training pipelines are not built to produce it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We closed last week on a question and this week opens by taking it seriously. The regulatory ground is already moving. The&nbsp;<a href=\"https:\/\/artificialintelligenceact.eu\" target=\"_blank\" rel=\"noopener\">EU AI Act<\/a>&nbsp;requires human oversight for high-risk AI systems under Article 14, and a leading safety-focused lab spent recent weeks arguing publicly for caution over systems that might begin to improve themselves without a human in the loop. Put those two together and the live question is no longer whether humans should oversee frontier training. It is whether labs will soon have to evidence that they did, to someone whose job is to disbelieve them until shown otherwise.<\/p>\n\n\n\n<!--\n  Ontology Roundup, Issue 05. Audit-readiness self-check.\n  WORDPRESS EMBED VERSION. Paste this whole block into a Gutenberg \"Custom HTML\" block\n  (or a Custom HTML widget). All CSS is scoped under #onto-audit-check so it will not\n  touch your theme; the script runs in an IIFE so it will not collide with other code.\n  No DOCTYPE\/head\/body, no external dependencies, no storage, no tracking.\n\n  NOTE: some security plugins strip <script> from Custom HTML blocks. If the widget\n  renders but does not respond to clicks, use the iframe method instead (see the .md note),\n  or paste via a block that permits inline JS.\n-->\n<div id=\"onto-audit-check\">\n  <div class=\"oac-swatch\"><\/div>\n  <div class=\"oac-eyebrow\">Ontology Roundup &middot; Issue 05 &middot; Monday<\/div>\n  <h2 class=\"oac-title\">Is your human-oversight documentation audit-ready?<\/h2>\n  <p class=\"oac-intro\">Answer the six questions an auditor would ask about the people who reviewed your model during training. Be honest: the gaps are the point. Each answer reveals the primitive that closes it.<\/p>\n\n  <div class=\"oac-meter-wrap\">\n    <div class=\"oac-meter-label\"><span>Audit-readiness<\/span><b class=\"oac-score\">0 of 6 evidenced<\/b><\/div>\n    <div class=\"oac-meter\"><i class=\"oac-bar\"><\/i><\/div>\n  <\/div>\n\n  <div class=\"oac-questions\"><\/div>\n\n  <div class=\"oac-result\">\n    <h3 class=\"oac-rtitle\">Your readiness<\/h3>\n    <div class=\"oac-verdict\"><\/div>\n    <p class=\"oac-rbody\"><\/p>\n    <ul class=\"oac-rgaps\"><\/ul>\n  <\/div>\n\n  <p class=\"oac-foot\">Illustrative self-check, not a compliance assessment. Self-contained: no tracking, no storage.<\/p>\n\n  <style>\n    #onto-audit-check{\n      --oac-navy:#02101C; --oac-blue:#48A3FF; --oac-ink:#EAF2FB;\n      --oac-muted:#8FA6BF; --oac-card:#0A1B2B; --oac-line:#173049;\n      background:var(--oac-navy); color:var(--oac-ink);\n      font-family:\"Helvetica Neue\",Arial,sans-serif; line-height:1.5;\n      max-width:640px; margin:24px auto; padding:28px 24px;\n      border-radius:16px; border:1px solid #143049;\n    }\n    #onto-audit-check *{box-sizing:border-box; margin:0; padding:0}\n    #onto-audit-check .oac-swatch{width:34px;height:6px;border-radius:3px;background:var(--oac-blue);margin-bottom:18px}\n    #onto-audit-check .oac-eyebrow{font-size:12px;letter-spacing:.16em;text-transform:uppercase;color:var(--oac-blue);font-weight:700;margin-bottom:10px}\n    #onto-audit-check .oac-title{font-size:25px;line-height:1.2;font-weight:800;letter-spacing:-.01em;margin-bottom:12px;color:var(--oac-ink)}\n    #onto-audit-check .oac-intro{font-size:16px;color:#CBDAEB;margin-bottom:8px}\n    #onto-audit-check .oac-meter-wrap{padding:16px 0 18px;position:sticky;top:0;background:var(--oac-navy);z-index:2}\n    #onto-audit-check .oac-meter-label{display:flex;justify-content:space-between;align-items:baseline;font-size:13px;color:var(--oac-muted);margin-bottom:7px}\n    #onto-audit-check .oac-meter-label b{color:var(--oac-ink);font-size:15px}\n    #onto-audit-check .oac-meter{height:10px;border-radius:6px;background:#102234;overflow:hidden;border:1px solid var(--oac-line)}\n    #onto-audit-check .oac-bar{display:block;height:100%;width:0;background:var(--oac-blue);transition:width .4s ease}\n    #onto-audit-check .oac-q{background:var(--oac-card);border:1px solid var(--oac-line);border-radius:14px;padding:20px 20px 18px;margin-bottom:14px}\n    #onto-audit-check .oac-qhead{display:flex;gap:12px;align-items:flex-start}\n    #onto-audit-check .oac-qnum{color:var(--oac-blue);font-weight:800;font-size:20px;min-width:20px}\n    #onto-audit-check .oac-q h4{font-size:18px;line-height:1.3;font-weight:700;color:var(--oac-ink)}\n    #onto-audit-check .oac-sub{font-size:14px;color:var(--oac-muted);margin:8px 0 14px 32px}\n    #onto-audit-check .oac-opts{display:flex;gap:8px;margin-left:32px;flex-wrap:wrap}\n    #onto-audit-check .oac-opt{background:transparent;border:1px solid #28486b;color:var(--oac-ink);padding:8px 14px;border-radius:9px;font-size:14px;cursor:pointer;font-family:inherit}\n    #onto-audit-check .oac-opt:hover{border-color:var(--oac-blue)}\n    #onto-audit-check .oac-opt.oac-sel{background:var(--oac-blue);border-color:var(--oac-blue);color:#02101C;font-weight:700}\n    #onto-audit-check .oac-reveal{margin:14px 0 0 32px;padding:12px 14px;border-left:2px solid var(--oac-blue);background:#0c2032;border-radius:0 8px 8px 0;font-size:14px;color:#CBDAEB;display:none}\n    #onto-audit-check .oac-reveal.oac-show{display:block}\n    #onto-audit-check .oac-reveal .oac-prim{color:var(--oac-blue);font-weight:700}\n    #onto-audit-check .oac-result{background:var(--oac-card);border:1px solid var(--oac-blue);border-radius:14px;padding:24px 22px;margin-top:6px;display:none}\n    #onto-audit-check .oac-result.oac-show{display:block}\n    #onto-audit-check .oac-result h3{font-size:21px;font-weight:800;margin-bottom:10px;color:var(--oac-ink)}\n    #onto-audit-check .oac-result p{font-size:15px;color:#CBDAEB;margin-bottom:10px}\n    #onto-audit-check .oac-verdict{font-size:16px;color:var(--oac-blue);font-weight:700;margin-bottom:12px}\n    #onto-audit-check .oac-rgaps{list-style:none;margin:6px 0 0}\n    #onto-audit-check .oac-rgaps li{padding:7px 0;border-bottom:1px solid #122740;font-size:14px;color:#CBDAEB}\n    #onto-audit-check .oac-rgaps li span{color:var(--oac-blue);font-weight:700}\n    #onto-audit-check .oac-foot{font-size:12px;color:var(--oac-muted);margin-top:18px;text-align:center}\n  <\/style>\n\n  <script>\n  (function(){\n    var root = document.getElementById('onto-audit-check');\n    if(!root || root.getAttribute('data-oac-init')) return;\n    root.setAttribute('data-oac-init','1');\n\n    var QS = [\n      { q:\"Which humans, specifically?\",\n        sub:\"Not a head-count. Named, persistent identities you can point to across the whole project, that survive the contract ending.\",\n        prim:\"W3C Decentralized Identifier (DID)\",\n        note:\"A stable identifier each evaluator controls and no platform owns, so the record does not vanish when the contract does.\" },\n      { q:\"Were they distinct real people?\",\n        sub:\"Proof that fifty reviewers are not five operators behind fifty accounts.\",\n        prim:\"anti-sybil proof of personhood\",\n        note:\"Without it, the oversight head-count is theatre and inter-rater agreement measures collusion, not corroboration.\" },\n      { q:\"Was their judgement consistent over time?\",\n        sub:\"Whether the same person reached the same judgement on equivalent cases through the project, and whether drift was caught.\",\n        prim:\"longitudinal consistency credential (VC)\",\n        note:\"Carried by the evaluator, not locked in the lab's own database where it cannot be independently checked.\" },\n      { q:\"Is each judgement attributable and fixed in time?\",\n        sub:\"Traceable to the individual who made it, and to when, so the trail cannot be quietly rewritten later.\",\n        prim:\"signed, timestamped Verifiable Credentials\",\n        note:\"This is the difference between a methodology section and an audit trail an inspector can test.\" },\n      { q:\"Can a compromised reviewer be flagged?\",\n        sub:\"If an evaluator is later found to have gamed the process, their prior work must be markable.\",\n        prim:\"W3C Bitstring Status List (revocation)\",\n        note:\"Revocation means old work can be flagged rather than silently trusted after the fact.\" },\n      { q:\"Is the evidence portable, or hostage to one vendor?\",\n        sub:\"Records that only exist inside one platform fail the moment that platform changes terms or disappears.\",\n        prim:\"portable, self-custodied credentials\",\n        note:\"Portable credentials keep the evidence with the people and the project, not the supplier.\" }\n    ];\n\n    var answers = new Array(QS.length).fill(null);\n    var qWrap = root.querySelector('.oac-questions');\n    var bar = root.querySelector('.oac-bar');\n    var scoreEl = root.querySelector('.oac-score');\n    var resultEl = root.querySelector('.oac-result');\n    var verdictEl = root.querySelector('.oac-verdict');\n    var bodyEl = root.querySelector('.oac-rbody');\n    var gapsEl = root.querySelector('.oac-rgaps');\n\n    QS.forEach(function(item, i){\n      var card = document.createElement('div');\n      card.className = 'oac-q';\n      card.innerHTML =\n        '<div class=\"oac-qhead\"><div class=\"oac-qnum\">'+(i+1)+'<\/div><h4>'+item.q+'<\/h4><\/div>'+\n        '<div class=\"oac-sub\">'+item.sub+'<\/div>'+\n        '<div class=\"oac-opts\">'+\n          '<button type=\"button\" class=\"oac-opt\" data-a=\"yes\">Yes, on record<\/button>'+\n          '<button type=\"button\" class=\"oac-opt\" data-a=\"partial\">Partly<\/button>'+\n          '<button type=\"button\" class=\"oac-opt\" data-a=\"no\">No<\/button>'+\n        '<\/div>'+\n        '<div class=\"oac-reveal\">Closed by: <span class=\"oac-prim\">'+item.prim+'<\/span>. '+item.note+'<\/div>';\n      qWrap.appendChild(card);\n\n      var opts = card.querySelectorAll('.oac-opt');\n      var reveal = card.querySelector('.oac-reveal');\n      opts.forEach(function(btn){\n        btn.addEventListener('click', function(){\n          opts.forEach(function(b){ b.classList.remove('oac-sel'); });\n          btn.classList.add('oac-sel');\n          answers[i] = btn.getAttribute('data-a');\n          reveal.classList.add('oac-show');\n          update();\n        });\n      });\n    });\n\n    function update(){\n      var yes = answers.filter(function(a){ return a==='yes'; }).length;\n      var answered = answers.filter(function(a){ return a!==null; }).length;\n      bar.style.width = (yes\/QS.length*100)+'%';\n      scoreEl.textContent = yes+' of '+QS.length+' evidenced';\n\n      if(answered === QS.length){\n        var gaps = [];\n        answers.forEach(function(a,i){ if(a!=='yes') gaps.push(i); });\n        resultEl.classList.add('oac-show');\n        var verdict, body;\n        if(yes === QS.length){\n          verdict = \"Audit-ready: every question has a verifiable record behind it.\";\n          body = \"Each claim about your evaluators is backed by a credential an auditor could check independently. That is provenance, not attestation, and it is the same record that makes your evaluation defensible on quality grounds.\";\n        } else if(yes >= 4){\n          verdict = \"Mostly attestation, not yet provenance.\";\n          body = \"You could tell an auditor most of the story, but the gaps below are exactly where \\\"trust us\\\" replaces a checkable record. Those are the items to wire into the pipeline first.\";\n        } else {\n          verdict = \"This is attestation. An auditor would not be able to verify your oversight.\";\n          body = \"You can describe your process, but little of it is evidenced in a way a third party could test. Each gap below maps onto a primitive that turns the claim into a record.\";\n        }\n        verdictEl.textContent = verdict;\n        bodyEl.textContent = body;\n        gapsEl.innerHTML = gaps.length ? '' : '<li>No gaps. Every question is backed by a verifiable record.<\/li>';\n        gaps.forEach(function(i){\n          var li = document.createElement('li');\n          li.innerHTML = '<span>'+(i+1)+'.<\/span> '+QS[i].q+' &nbsp;&rarr;&nbsp; '+QS[i].prim;\n          gapsEl.appendChild(li);\n        });\n      }\n    }\n  })();\n  <\/script>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">From \u201cwe had humans in the loop\u201d to \u201cprove it\u201d<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There is a quiet but enormous distance between those two sentences. The first is a description of a process. The second is a demand for evidence, and evidence has properties that good intentions do not. It has to point at specific people. It has to survive the claim that those people were not who the lab says they were, or were the same handful of contractors wearing different names, or drifted in their judgement halfway through the project and nobody noticed. An auditor is not satisfied by a methodology section. They want the underlying record, and they want to be able to test it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Most teams cannot produce that record today, not because they were careless but because the tools they used were never designed to leave one. Crowd platforms issue worker IDs that mean nothing outside the platform and vanish when the contract ends. Internal review is logged, if at all, in systems that the reviewing team controls and can therefore edit. Consistency is measured at onboarding and rarely tracked after. None of this is dishonest. It simply is not provenance, and the gap only becomes visible the moment a third party asks to see the proof.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Attestation is a claim; provenance is a record<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The cleanest way to see the problem is to separate two things that usually get bundled together. Attestation is a lab saying, in effect, trust us, qualified humans reviewed this. Provenance is a record that lets someone who does not trust the lab check the claim for themselves. Attestation scales beautifully and proves nothing. Provenance is harder to produce and is the only thing an auditor can actually act on.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This distinction is not new to anyone who works in regulated industries. A pharmaceutical company cannot tell an inspector that its batch records are fine and expect that to end the conversation. The records have to exist, be attributable to named, qualified individuals, be timestamped, and be tamper-evident, so that the inspector can reconstruct what happened without taking the company&#8217;s word for it. Frontier AI training is drifting toward the same standard for the same reason: the stakes have risen far enough that self-attestation is no longer considered adequate. The interesting part is that the AI field already has the primitives to produce real provenance. They are just not yet wired into the evaluation pipeline.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What an auditor would actually ask for<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Strip the compliance language away and an audit of human oversight reduces to a short list of questions about the people who did the overseeing. Each one is answerable today with a credential rather than a promise.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Which humans, specifically?\u00a0<\/strong>Not a head-count, named and persistent identities you can point to across the whole project. A Decentralized Identifier gives each evaluator a stable identifier they control and no platform owns, so the record survives the contract.<\/li>\n\n\n\n<li><strong>Were they distinct real people?\u00a0<\/strong>Anti-sybil proof of personhood is what stops a pool of fifty reviewers turning out to be five operators behind fifty accounts. Without it, the oversight head-count is theatre.<\/li>\n\n\n\n<li><strong>Was their judgement consistent?\u00a0<\/strong>A longitudinal consistency record, carried by the evaluator rather than locked in the lab&#8217;s database, shows whether the same person reached the same judgement on equivalent cases over the life of the project, and whether drift was caught.<\/li>\n\n\n\n<li><strong>Is each judgement attributable and fixed in time?\u00a0<\/strong>Signed, timestamped Verifiable Credentials make an individual judgement traceable to the individual who made it, and to when, so the audit trail cannot be quietly rewritten afterwards.<\/li>\n\n\n\n<li><strong>Can a compromised reviewer be flagged?\u00a0<\/strong>A Bitstring Status List provides revocation, so if an evaluator is later found to have gamed the process their prior work can be marked rather than silently trusted.<\/li>\n\n\n\n<li><strong>Is the evidence portable, or hostage to one vendor?\u00a0<\/strong>Records that only exist inside one platform fail the moment that platform changes terms or disappears. Portable, self-custodied credentials keep the evidence with the people and the project, not the supplier.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Read that list back and the striking thing is how little of it is speculative. Every item maps onto a primitive that the standards bodies settled years ago:\u00a0<a href=\"https:\/\/www.w3.org\/TR\/did-1.1\/\" target=\"_blank\" rel=\"noopener\">Decentralized Identifiers<\/a>,\u00a0<a href=\"https:\/\/www.w3.org\/TR\/vc-data-model-2.0\/\" target=\"_blank\" rel=\"noopener\">Verifiable Credentials<\/a>\u00a0with selective disclosure, anti-sybil personhood, and the <a href=\"https:\/\/www.w3.org\/TR\/vc-bitstring-status-list\/\" target=\"_blank\" rel=\"noopener\">Bitstring Status List<\/a>\u00a0for revocation. The\u00a0<a href=\"https:\/\/identity.foundation\/\" target=\"_blank\" rel=\"noopener\">Decentralized Identity Foundation<\/a>\u00a0maintains the interoperability work that keeps such credentials portable across platforms. What has been missing is not the technology but the reason to deploy it. Compliance is rapidly becoming that reason.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The substrate, reframed<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For most of the past year the case for verifiable evaluator identity has been a quality argument: you get better evaluation when you can prove your evaluators are real, distinct and consistent. That argument still holds. What is changing is that the same infrastructure is starting to answer a second question from a different and less forgiving audience. A regulator does not care whether your evaluation was high quality in the abstract. They care whether you can prove who did it and stand behind the record. The remarkable thing is that the answer to both questions is the same stack.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ontology is the substrate that produces that record, through <a href=\"https:\/\/ont.id\" target=\"_blank\" rel=\"noopener\">ONT ID<\/a> and <a href=\"https:\/\/onto.app\" target=\"_blank\" rel=\"noopener\">ONTO Wallet<\/a>. It is worth being precise about the claim. Ontology is not a compliance-as-a-service product and does not audit anyone; it provides the identity and credential primitives that let evaluators carry a portable, verifiable record of who they are and what they judged. Whether a given lab needs that record for quality, for an auditor, or for both is the lab&#8217;s call. The point of this week is that the second reason is arriving faster than most people expected, and the teams that already treated evaluator provenance as infrastructure will find they built their compliance evidence by accident. Tomorrow we take the same question into agentic systems, where the human in the loop becomes a human at specific checkpoints and documenting which ones is harder still.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Human oversight documentation&nbsp;is the auditable record proving that real, identifiable people reviewed and shaped a model during training, and as recursive self-improvement moves up the regulatory agenda it is shifting from internal good practice toward a likely compliance requirement. What an auditor adds to the picture is provenance. It is no longer enough to assert<\/p>\n","protected":false},"author":5,"featured_media":931,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[170,113,13],"tags":[25,70,72,150,198,199,200,201],"class_list":["post-927","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-data","category-did-and-privacy","tag-decentralized-identity","tag-ontology","tag-verifiable-credentials","tag-eu-ai-act","tag-human-oversight-documentation","tag-ai-governance","tag-ai-compliance","tag-evaluation-provenance"],"_links":{"self":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts\/927","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/comments?post=927"}],"version-history":[{"count":3,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts\/927\/revisions"}],"predecessor-version":[{"id":930,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/posts\/927\/revisions\/930"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/media\/931"}],"wp:attachment":[{"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/media?parent=927"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/categories?post=927"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ont.io\/news\/wp-json\/wp\/v2\/tags?post=927"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}