Privacy, Data and the Future of AI Data

Why we brought this conversation back

AI data privacy is no longer a niche concern. The Ontology Privacy Hour returned this month with a new format, video, on X and YouTube, to tackle how artificial intelligence has turned a slow-burning argument about data rights into an urgent, mainstream problem. The same questions that have always sat at the heart of Ontology, identity, privacy, decentralisation, have not gone away. But the context around them has changed beyond recognition. Generative AI has turned a slow-burning argument about data rights into an urgent, mainstream problem. The tools most people now use every day are trained on content we produced, and increasingly fed with the things we type into a prompt box. The honeypot just got bigger, faster, and far more valuable.

This recap pulls together the most important threads from the conversation, the framings worth carrying forward, and where Ontology sees its role in what comes next. Each section will be expanded into its own follow-up piece in the coming weeks, and we will link those out from here as they publish.

Watch the full episode on YouTube

Meet the guests

The episode was hosted by Humpty Calderon, with three guests bringing very different vantage points on the same problem.

Juliun is a long-time builder in the blockchain space and is currently working on content provenance tooling for artists, guilds and studios in Los Angeles and Hollywood, including work with the Coalition for Content Provenance and Authenticity (C2PA). His company is preparing to launch its first public product, Monolith, in the coming weeks.

Nick Ris has spent twenty-five years at the cutting edge of B2B technology, the last nine focused specifically on decentralised identity. He was part of the team that built Sovrin, the first blockchain designed for identity, and signed early enterprise customers including Cisco, T-Mobile and IBM. He now runs a ventures consultancy helping organisations make sense of digital trust, user-centric identity and the new market forces around data.

Geoff Richards, Ontology’s Chief Communications Officer, joined to ground the conversation in how these questions are starting to land for ordinary users, and where Ontology and ONTO Wallet fit in.

Follow the guests: Juliun on X | Stability on X | Nick on LinkedIn | Mission on LinkedIn | Geoff on X | Ontology on X | Humpty on X

The honeypot is broken: we need a new bucket

Nick offered the strongest single image of the conversation. Today’s security industry, he argued, treats data protection like a bucket with leaks: every breach prompts another patch, another layer, another control. Each year the bucket grows bigger and the leaks more numerous. The fundamental model is wrong. Concentrated honeypots of data will always attract attackers, and AI has now democratised the ability to launch sophisticated attacks at scale. CIOs and CISOs are reporting exponential growth in breach attempts over the last twelve months. One global telco he spoke to recently is fielding 200 billion cyber attacks per day.

The answer is not a better bucket. It is a different shape entirely: data distributed across nodes, whether those nodes are organisations, people, agents or devices, with each party controlling its own slice. And critically, the data on its own is not enough. It must be tied to the person, or the agent, authorising its use. In the new model, knowing somebody’s social security number is worthless unless you can also prove that the person showing it to you is actually them.

We need a new bucket… any time you’ve got that concentration of data, somebody will find a way. – Nick

Peak human content: 2025 was the line

Juliun introduced what may become the most quoted framing of the year. 2025, he argued, was the last year in which most of the content on the internet was made by humans. From this point forward, humans will always be in second place. Ninety-nine per cent of new data will be AI generated, and increasingly indistinguishable from human work.

2025 was the last year that humans produced most content. Most data was made by humans. That was it. That was the best we’ll ever get. – Juliun

There is a strange consequence to this. If everything is assumed to be synthetic by default, then in a sense everything is private by default. The thing that needs protecting is not the data itself but the moment a human chooses to say: this is mine. Authentication of personhood becomes the entire game. The data is noise; the signature is the signal.

This is a contrarian position, and not everyone will land on the same conclusion. But it is a useful inversion. It moves the problem from defending an ever-expanding surface area to securing a much smaller, much more defensible one.

From ownership to consent, audit and revocation

Nick pushed back gently on the framing the industry has been using for years. The dominant pitch for decentralised identity has been data ownership: own your data, control your data, sell your data. He thinks that is the wrong lens. Ownership is not really the point, and arguing about it gets us tangled in metaphors that do not fit how data actually works.

The more useful frame is consent, audit and revocation. Can I give permission for this data to be used, in this context, for this purpose? Can I see where it has been used? Can I take that permission back? These three capabilities, together, give an individual meaningful agency without requiring the philosophical contortions that come with trying to own a copy of a number.

This is a positioning shift Ontology and ONTO Wallet should take seriously. It is a clearer story, it maps more closely to how regulators are starting to think, and it puts the burden where it belongs: on the systems that use the data, not on the individual to retain physical custody of it.

Public privacy versus private privacy

One of the most underexplored ideas in the conversation came from Geoff. Most of the privacy debate, he pointed out, focuses on what we share publicly: posts, photos, comments, the trail we leave behind on social platforms. But there is a second category that is growing far faster and getting almost no scrutiny. It is what we share privately, directly, with AI models.

His own example: a recent set of blood tests, uploaded into a Claude project so the model could help interpret them. Useful, immediate, transformative even. But also: handed over without any meaningful sense of where that data goes, how it is retained, what it is used for, or who else might eventually see it. Most people using AI tools today are doing some version of this. The terms of the trade are not clear, and for the most part, users are not asking.

They don’t exist without our data, but actually they’re not useful to us without our data. – Geoff

Public privacy is about what you broadcast. Private privacy is about what you whisper directly into the ear of a system you do not own. The first has decades of debate and regulation behind it. The second is largely unguarded. Both matter, and they are not the same problem.

Attribution is the asset

Juliun made a second sharp point that follows directly from the first. If data itself is becoming worthless, what is valuable is attribution. His example was Bloomberg. If Bloomberg says Tesla’s stock is at a particular price, that statement carries weight. If a random anonymous account on a social platform says the same thing, it does not. The data is identical. The source is everything.

Push this idea further and a strange but compelling future appears. Bloomberg, in this world, does not need to protect its raw data at all. Everything could be open. The only thing it protects is the moment it stamps a piece of information with its name. Attribution becomes the product. The attack surface shrinks to almost nothing. And the same logic applies to individuals: I do not need to defend every byte of data I leak into the world if the only thing that carries value is the moment I attach my verified self to a statement.

This is a future in which proof of personhood, verifiable credentials and on-chain attribution all become load-bearing infrastructure. It is also, not coincidentally, the world Ontology has been building toward for years.

Context is the new surface area

Most people, Juliun argued, do not actually talk to large language models any more. They talk to agents. When you say hello to ChatGPT and it greets you by name, that is not the model knowing you. It is context being injected into the prompt: a system prompt, a user profile, a vector search across your history, a re-ranking step, your conversation memory. All of that gets sent with every interaction. The model, in any meaningful sense, knows nothing about you. The context layer knows everything.
This matters for two reasons. The first is performance: a well-engineered context can make a mediocre model outperform a much larger one, which has real implications for cost, latency and where the value sits in the AI stack. The second is privacy: the context layer is where the sensitive data actually lives. It is the surface that needs protecting, the surface that needs portability, and the surface where integrity guarantees matter most.

Geoff extended the point. He keeps Markdown files containing enough information about himself that a model could plausibly clone his voice and judgement. Imagine, he suggested, a future where those context files are privacy-preserving, portable and verifiable. You take your context with you, between models, between providers, with cryptographic guarantees that nobody, including the model vendor, has tampered with it. Juliun, who already runs a local Lance database with a local embedding model and reranker for exactly this reason, agreed: this is where things have to go.

File integrity was the other thread here. As more of our work passes through AI tools, the question of who, or what, was the last editor becomes harder to answer. Was it me? Was it Claude? Did the file change in a way I did not notice? Anyone who has spent time vibe-coding or producing content with an AI in the loop has felt this. Provable provenance, version integrity and timestamping are no longer abstract concerns for cryptographers. They are becoming everyday workflow problems.

Monetisation: a two-way street

The conversation closed on the question of money. Who pays whom, in a world where data is everywhere and personhood is scarce?

Geoff offered the cleanest framing. There are two distinct strands. The first is the value we get from giving data to AI to do things for us: check our health, plan a renovation, draft a contract. We should pay for that, and most of us already do. Fine. The second is the value AI extracts from our data when it uses it for something else, training, resale, profiling. In that direction, the flow should reverse. There should be a simple toggle, with the user deciding when their data leaves the private context and on what terms. The relationship, as it stands today, only flows one way. It needs to flow both.

Nick took a more infrastructural view. In a world where every person and organisation is a node on a network, you need the right incentives, and more importantly the right disincentives, to keep the network honest. Payment gateways, the kind of work Coinbase and others are now doing around agent-to-agent payments, are probably the simplest mechanism for this. The first utility of monetisation may not be paying creators for their content. It may be making bad behaviour expensive enough that the network self-polices.

Juliun closed on the most optimistic note of the hour. The power, he argued, is quietly shifting back to humans. AI has eaten the world’s available training data, and now its appetite is starting to outrun the supply. He cited a recent example: an artist working with one of the largest tech companies, paid five hundred thousand dollars a day to generate 3D models the company could not source authentically anywhere else. Real human work, attributable to a real human, is becoming the scarcest input in the AI economy. The defensible asset is not data. It is personhood.

Where this leaves Ontology

Several of the threads in this conversation, from AI data privacy to content provenance, map directly onto what Ontology has been building. Decentralised identity that proves a person is a person. Verifiable credentials that let an individual consent, audit and revoke. ONTO Wallet as the place where users can carry their identity, their data and their reputation across applications without surrendering custody to a hyperscaler. On-chain anchoring for content provenance and integrity, of the kind C2PA is reaching for but struggling to keep attached to the underlying content.

The conversation also surfaced the threads we want to push harder on in the months ahead: the public-versus-private privacy distinction, the consent-audit-revocation framing, and the case for portable, verifiable AI context as the next privacy frontier. Each of those will get its own dedicated piece, and we will link them from here as they publish.

If you want to follow the work more directly, ONTO Wallet is the consumer-facing front door. The Ontology Network blog and newsletter are where the deeper infrastructure and ecosystem updates land. And the Privacy Hour will be back.

Watch the episode

Privacy, Data and the Future of AI Data, on YouTube

Follow-up pieces (linked as they publish)

  • Peak Human Content: why 2025 was the line
  • Public Privacy versus Private Privacy
  • We need a new bucket: rethinking the honeypot
  • From ownership to consent, audit and revocation
  • The Bloomberg Signal: attribution as the asset
  • C2PA is great, but it is fragile: where blockchain fits
  • Context is the new secret sauce
  • File integrity in the age of vibe coding
  • AI ate the world: the power shift back to humans

Guest links

Juliun: @juliun_b | @stabilityinc
Nick Ris: @nickris | Mission on LinkedIn
Humpty Calderon: @humpty0x
Geoff Richards: @GeoffTRichards | @OntologyNetwork