Hallucination is the feature

A post crossed my LinkedIn feed recently that stopped me scrolling. A developer admitted, publicly, that he sometimes deliberately wants AI to hallucinate. Not on production code. But when exploring ideas, building landing pages, pushing past what he would have come up with on his own, he actively wanted the model to generate things that did not exist yet.

It got me thinking about how misleading the word “hallucination” has become. We gave this behaviour a name that implies malfunction, a clinical term borrowed from psychiatry that suggests the system is broken. But the behaviour it describes, a language model generating something not present in its source material, could just as easily be called creativity. The word we chose has shaped the entire industry debate, and not in a helpful direction.

The people treating hallucination as a fatal flaw and the people deliberately inviting it are looking at the same capability. The difference is not the model. It is the constraints around it.

What hallucination actually is

Hallucination is not a bug. It is not a glitch that will be patched in the next release. It is the generative capability itself. The mechanism that fabricates a legal citation that does not exist is the same mechanism that designs a drug compound that has never been synthesised. The mechanism that invents a navigation instruction for a menu bar that is not there is the same mechanism that surfaces an engineering approach you would not have considered.

OpenAI’s own researchers proved this mathematically in September 2025. Hallucinations are inherent to how these models generate language. The training and evaluation procedures reward guessing over acknowledging uncertainty. Nine out of ten major benchmarks use binary grading that gives zero points for “I don’t know” while rewarding confident guesses. The optimal strategy, from the model’s perspective, is always to guess. OpenAI’s own reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time on person-specific questions. These are models built to work through problems step by step. They hallucinate more than their predecessors, not less.

A Carnegie Mellon study published in Memory & Cognition in 2025 added another dimension. LLMs are systematically overconfident. Unlike humans, who adjust their confidence estimates downward after being shown they were wrong, language models remain or become more overconfident even after errors. As researcher Danny Oppenheimer put it, AI “asserts the answer with confidence, even when that confidence is unwarranted.”

They do not just get it wrong. They get it wrong with conviction. And they never learn to doubt themselves.

If these models could only retrieve and recombine existing information, they would just be search engines. The ability to generate things that have never existed is the core value proposition. In drug discovery, Insilico Medicine’s AI-designed compound for idiopathic pulmonary fibrosis reached Phase II clinical trials, identified in roughly twelve months versus the typical four to six years. The entire field of generative chemistry depends on models producing molecules that have never been synthesised. That is hallucination, pointed in a useful direction.

Why it is genuinely dangerous

None of this diminishes the risk. The consequences of uncontrolled hallucination are real, documented, and growing.

Damien Charlotin maintains a database of legal decisions involving hallucinated content, now exceeding 150 cases across international jurisdictions. Courts have imposed sanctions, fines, and standing orders requiring disclosure of AI use. A Deloitte report submitted to the Australian government contained fabricated academic sources and a fake court quote, costing A$440,000. A separate Deloitte report for the Newfoundland government included at least four non-existent research papers.

Therapy chatbots have given dangerous advice to vulnerable users. AI travel planning has sent tourists to fictitious locations. Customer service agents have confidently cited policies that do not exist.

The pattern is always the same. The model generates something plausible, delivers it with authority, and the human on the receiving end has no signal that anything is wrong. The confidence paradox makes this worse. The model sounds most authoritative precisely when it is most likely to be fabricating.

For anyone building systems where accuracy matters, where the output drives decisions, where wrong answers have consequences, this is not something to dismiss. It is something to engineer around.

The brilliant consultant

Here is the bridge between danger and value.

You would not give a brilliant consultant the keys to your production systems on day one, no matter how impressive they were in the interview. You would start with advisory. You would validate their recommendations against your own knowledge. You would watch how they handle edge cases. You would build trust through evidence, not assumption. And gradually, as confidence grew through demonstrated competence, you would give them more autonomy.

That is not an AI concept. That is just good management. It is how experienced engineers have always managed risk with external expertise. The brilliance is exactly why you hired them. The constraints are what make the brilliance safe to use.

Language models are the same. The generative capability is the brilliance. The constraints are what determine whether that brilliance produces value or damage. And the constraint is not a binary switch. It is a dial.

Constraint engineering in practice

We are building a database engineering expert system at inmydata right now, and the hallucination problem is the problem we have engineered most carefully around.

The system captures thirty years of database administration expertise. Documentation, presentations, diagnostic knowledge, the accumulated judgement of specialists who have been solving complex database problems for decades. All of it sits in our knowledge base. An agent on the front end retrieves from that knowledge base and answers questions.

The agent is explicitly constrained to answer from information retrieved through our RAG pipeline. When nothing relevant surfaces above our quality threshold, it is instructed to say “I don’t know.” Not to guess. Not to reason from general knowledge. Not to offer something plausible. The prompts, the retrieval architecture, and the quality gates are all designed to enforce that boundary. To say, clearly, that it does not have the information to answer.

This is not infallible. We wrote about this in our context engineering post. During a live demo, the agent was told not to guess and it guessed anyway. The instruction was clear. The model ignored it. That experience drove us to build architectural safeguards, quality gates, secondary classifiers, structured retrieval, rather than relying on prompts alone.

But the constraint architecture is only half the story. The other half is phased trust-building.

The agent starts as an internal tool for the company’s own engineers. It retrieves information, answers questions, helps with diagnostics. Read-only. No actions.

When the engineers trust the agent’s judgement, it becomes customer-facing, but still read-only. Customers can ask it questions. It cannot touch anything.

Next, the agent starts suggesting actions. An engineer reviews every suggestion before anything is executed. The agent recommends. The human decides.

Then the agent suggests actions directly to customers, who can choose to execute them. The human is still in the loop, but the loop has widened.

Only after sustained evidence that the agent’s suggestions are consistently correct, well-scoped, and safe do we allow autonomous execution. And even then, we start with low-impact actions and gradually build up to operations that carry more weight. Everything is audited. The kill switch is always there.

Each stage generates evidence that informs the next. The audit trail from stage one tells you whether the agent is ready for stage two. Trust is not a configuration setting. It is accumulated proof.

This is the hallucination dial turned almost all the way down. Context is tightly curated. Retrieval is bounded. Quality gates filter borderline results. The model’s generative capability is constrained to a narrow domain where we can validate its outputs against known reality. In that setting, hallucination is a risk to be managed, and we manage it through architecture, not hope.

The other end of the dial

When I am exploring engineering approaches for a new system, reasoning through how to structure a complex integration, or working through a problem I have not solved before, I want something very different from the model. I want it to push past what I would naturally consider. I want it to suggest approaches I have not thought of, to make connections across domains, to surface ideas that challenge my assumptions.

That is not reckless. It is intentional. The context is different. I am not deploying the output to a production system. I am using the model as a cognitive partner in an exploration phase where my own judgement, experience, and verification are the quality gate. The model’s generative capability, its willingness to produce things that do not yet exist, is exactly what makes it useful in that setting.

The same mechanism. The same model. Completely different constraint position. In one context I am doing everything I can to prevent the model from generating beyond its source material. In the other I am actively inviting it to do exactly that. Both are rational. Both are disciplined. The discipline is in knowing which setting you are in and engineering accordingly.

You already know how to think about this

If you have spent years building production software, you have been managing this kind of tension for your entire career. You have built validation layers because users do unexpected things. You have implemented quality gates because data arrives in formats nobody anticipated. You have phased rollouts because deploying everything at once is reckless regardless of how well it tested. You have human approval workflows because some actions are too consequential to automate without oversight.

The hallucination problem is not a new category of challenge. It is the same discipline you have always applied, translated to a system that generates language instead of processing transactions.

The people waiting for hallucination to be “solved” before they build with AI are waiting for something that will not arrive. OpenAI’s own research proves it is structural, not a defect in progress toward elimination. But the people dismissing hallucination as a fatal flaw are equally wrong. The generative capability that produces hallucinations is the capability that makes these models valuable. Eliminating it would not make AI safer. It would make AI useless.

The real skill is understanding the dial. In production systems where accuracy is non-negotiable, you constrain tightly. Curated context, quality gates, phased trust-building, human oversight. In exploratory work where you want the model to push past what you already know, you loosen deliberately. Same capability. Same mechanism. Different constraints. Different outcomes.

The industry gave this behaviour a name that implies malfunction. It is not a malfunction. It is the core value of these systems. The engineering challenge, the one your career has prepared you for, is knowing when to constrain it and when to let it run.

Hallucination is the feature

What hallucination actually is

Why it is genuinely dangerous

The brilliant consultant

Constraint engineering in practice

The other end of the dial

You already know how to think about this

More from the blog

The knowledge that matters has never been computerised. Until now.

Stop tuning your prompts. Start engineering your context.

Your AI can't add up. So we stopped asking it to.

Turn expertise into infrastructure