AI Chatbot Privacy: Public versus Private

The AI chatbot privacy gap: public data is regulated, private AI disclosures are not

AI chatbot privacy is not one problem. It is two, and they are being treated as if they are the same. They are not. One has been debated for two decades, legislated across dozens of jurisdictions, and turned into a mature industry of consent banners, subject access requests and data protection officers. The other is barely understood, rarely regulated, and growing faster than anything the first category has had to absorb.

The first is public privacy: the data you broadcast. Posts, photos, reviews, cookies, the behavioural trail left behind by every app on your phone. The second is private privacy: what you hand directly to an AI model in a prompt, an uploaded file, a context window, a memory store. One is shouted into a crowd. The other is whispered into the ear of a system you do not own.

This is the distinction Geoff raised during the recent Ontology Privacy Hour, and it is one of the most useful framings to come out of that conversation. Because if you are trying to think clearly about what personal data protection looks like in the age of AI, collapsing the two categories together is the fastest way to get it wrong.

Public privacy is the category we learned to debate

Public privacy is a solved problem, at least in the sense that we have agreed there is a problem. The European Union’s General Data Protection Regulation took effect in 2018, giving hundreds of millions of people rights of access, rectification, erasure and portability over data that companies hold about them. California followed with the CCPA in 2020. The United Kingdom’s Data Protection Act, Brazil’s LGPD and a growing number of state-level and national frameworks now cover most of the world’s major economies.

None of this is perfect. Enforcement is patchy, consent banners are a joke, and the data broker industry is still larger and less transparent than the rules that govern it. But the conceptual work is done. If a company publishes a tracking pixel on a website, collects an email address in a newsletter form, or shares a customer record with a third party, there is a clear framework for what it is allowed to do, what it must disclose, and what the individual can demand in return. Two decades of argument, litigation and lobbying have produced a working vocabulary.

That vocabulary assumes the data flows outwards, from the person to the platform to the advertiser to the broker. The regulator’s job is to keep the flow visible and to put limits on what the downstream parties can do with it. That mental model has shaped almost every major privacy framework in force today.

Private privacy is the category nobody is watching

Now consider the other flow: the one that has quietly become the largest channel of personal disclosure in human history.

In February 2026, OpenAI reported that ChatGPT had reached 900 million weekly active users, up from 400 million a year earlier. That is one product, from one company. Add Claude, Gemini, Copilot, Perplexity, Grok, Meta AI and the long tail of specialist assistants, and the number of people typing things into a model every week is comfortably over a billion.

What they are typing is not small talk. A 2025 Stanford study on chatbot privacy risks found that users routinely share sensitive personal information with general-purpose assistants, including health details, financial records, legal questions and workplace material. Separately, reporting in The Register in March 2026 documented healthcare staff pasting identifiable patient data directly into consumer chatbots, with real names, dates of birth and diagnosis codes ending up in commercial training corpora.

None of this is covered by the framework that handles public privacy. GDPR was written for a world in which the relationship between a person and a piece of software was mostly transactional. The private privacy category is different: the user volunteers the data, in context, to get a specific answer, and often has no idea whether that data is retained, trained on, or re-surfaced later.

A blood test, in a prompt box

The example Geoff used during the Privacy Hour is a clean illustration. A recent set of blood tests, uploaded into a Claude project so the model could help interpret the results. Useful, immediate, transformative even. And also: handed over without any clear sense of where that data goes, how long it is retained, what it is used for, or whether any part of it will end up in a training run.

The answers are discoverable, but only if you know where to look. OpenAI’s data controls live in a help-centre article, and the specific setting for opting out of model training is buried another click deeper. Similar pages exist for every major provider. Almost no one reads them.

The point is not that these companies are acting in bad faith. Most of them publish their policies clearly and most of them offer opt-outs. The point is that the default state of a user’s private disclosures to AI is opaque, retained, and, unless the user has actively dug into the settings, eligible to inform future models. Compare that to the public privacy world, where default settings are increasingly shaped by regulators. The gap is structural.

Why AI chatbot privacy is growing faster

Three things are driving the volume upward, and each one widens the surface.

Memory. Every major assistant now keeps persistent memory about its users. What was once an ephemeral conversation becomes a stored profile, built up across weeks and months, with the user rarely prompted to review what it contains. The assistant gets better. The retention grows.

Agents. The shift from chat to agent, from a box you type into to a process that acts on your behalf, pushes more private data into the context layer. An agent booking travel, drafting contracts or triaging a doctor’s appointment needs access to more of your life than a chat session. Every agent integration adds another channel of private disclosure.

Enterprise adoption. Company deployments of LLMs now handle HR records, legal documents, customer support transcripts and internal code. Even where business-tier products promise not to train on inputs by default, the data is still processed, logged and retained under the provider’s terms. Regulation here is partly keeping pace: the EU AI Act’s transparency rules for general-purpose models take effect in August 2026, and Article 10 imposes data governance obligations on high-risk systems. But the bulk of consumer-scale private disclosure sits outside that scope.

What the answer has to look like

Getting private privacy right is partly a policy problem and partly an infrastructure problem. The policy side will follow its own pace. The infrastructure side is where the more interesting work is.

The framing Nick Ris offered during the Privacy Hour transfers directly here. The useful unit of control is not ownership. It is consent, audit and revocation. Can I give permission for this specific piece of data to be used, in this context, for this purpose? Can I see where it has been used? Can I take that permission back?

Applied to public privacy, this is familiar. Applied to private privacy, it is almost entirely missing. There is no standard mechanism today for a user to say: this prompt, this file, this conversation is usable by this model, for this session, and not otherwise. There is no portable audit trail that shows which providers have seen which inputs. There is no revocation primitive that reaches into the retained context of a dozen different assistants.

This is where the infrastructure Ontology has been building matters. Decentralised identity that proves a person is a person. Verifiable credentials that carry specific, scoped claims without disclosing the underlying data. ONTO Wallet as the place where an individual can hold those credentials and govern them from a single point of control. These are not speculative capabilities. They are the primitives any serious answer to AI chatbot privacy will have to rest on.

The next frontier

Public privacy shaped the first two decades of the consumer internet. Most of the institutions, laws and user expectations we have today were built around it. That work is far from done, but the direction is clear and the vocabulary is in place.

Private privacy will shape the next decade, and the work has barely started. The volume of personal disclosure going into AI systems is already larger than any category public privacy has ever had to handle, and it is growing faster than the regulation that will eventually catch up. The users doing the disclosing are, for the most part, not asking the questions. The providers receiving the data are operating on terms that sit outside the mental model most people use for their digital lives.

Closing the gap does not mean slowing AI down. These tools are genuinely transformative, and restricting them is not the point. The point is that the fastest-growing category of personal disclosure in the history of computing should not also be the least understood. Public privacy gave us the playbook. Private privacy is the frontier.

This article is part of a series expanding on themes from the Ontology Privacy Hour: Privacy, Data and the Future of AI Data. Watch the full episode on YouTube.