The Case for Slower AI

LLMs come in for plenty of criticism: excessive verbosity, over-hedging, sycophancy, formulaic bullet-point structures, confident hallucination, losing context in long conversations, corporate blandness, repetitive phrasing, and ignoring instructions. But think about it. Consider the following conversation using Claude Opus 4.5 via the Anthropic Console (so we’re directly querying the model rather than via the chat architecture stack):

ME: In less than 30 words, explain general relativity

CLAUDE: Massive objects curve the fabric of spacetime around them. Other objects follow this curvature, which we perceive as gravity. Time also passes slower in stronger gravitational fields.

Granted this is an easy question for an LLM to answer. But because we directly queried the model, we know that this was its direct output, generated in a fraction of a second. How would the average person perform, given the same question and instructed to answer immediately, giving their instant, first response? It’s hard to imagine anyone coming close to the quality of the response we had from Claude. To answer this question properly, we’d expect time to think. Time to write a first draft, count the words, revise it. Given this opportunity, a good many people could craft a similar, maybe even better response.

Time to Think

It turns out that, like us, LLMs can give better answers when they have time to think, to consider their work, and to rewrite where they see fit. It’s one of the core features of agentic AI, giving language models the tools to gather and consider information to help formulate a response. Spend any time with coding agents, and the importance of planning before writing code becomes very apparent.

My own learned approach to getting the best out of coding agents illustrates this perfectly:

Manually write an outline plan of what you want to build, making sure to clearly define key goals, architectural decisions, and technology choices.
Ask the coding agent to build a detailed, phased implementation plan, writing to a markdown file.
Start a new chat session (with an empty context), and ask the coding agent to read, understand, and critique the plan. You can optionally ask the agent to suggest 3 improvements we could make.
Read through the critique and suggestions, and ask the agent to update the markdown file plan with those points you agree with.
Repeat steps 3 and 4 until you are happy with the plan. I will usually repeat these two steps at least 3 times, but often many more.
Ask the agent to implement each phase of the plan. After each phase, ask the agent to stop, think and consider the work it has done, then apply any improvements.

With this sort of approach, and modern models like Gemini Pro or Opus 4.5, we have successfully implemented large scale engineering tasks in just a few days.

The LLM Council

But if giving one model time to think improves results, what happens when you add more minds to the process? Andrej Karpathy’s LLM Council illustrates this concept perfectly. The idea is elegantly simple. Instead of querying one LLM, you assemble a council of them. GPT, Gemini, Claude, Grok, whatever combination you prefer. Each model receives your question independently and produces its response. Then comes the interesting part. Each model reviews the anonymised responses from the others, ranking them for accuracy and insight. Finally, a designated “chairman” model synthesises everything into a single consolidated answer.

It’s peer review for AI. The same process that underpins scientific publication, academic assessment, and professional quality control, applied to language model outputs.

The anonymisation is a clever touch. Without knowing which response came from which model, the reviewers can’t play favourites. They must evaluate purely on merit. This mirrors double-blind peer review in academia, where reviewers don’t know the author’s identity and vice versa.

Implications for Agentic AI

The implications for agentic AI workflows are tantalising. Imagine coding agents that don’t just plan and critique their own work, but consult a council of models at key decision points. Technology choices reviewed by multiple perspectives before implementation begins. Code critiqued by different models, each catching different classes of error. Documentation assessed for clarity by minds trained on different corpora. The ensemble approach could slot naturally into the kind of phased, reflective workflows that already produce the best results.

At inmydata, we’re exploring these patterns in our work on expert systems for demand prediction and buyer assistance, where the cost of a wrong answer is measured in real money. Having multiple models cross-check forecasts and recommendations before they reach a buyer feels less like a luxury and more like due diligence.

Our Understanding Deepens

We are, I suspect, only beginning to grasp how to work effectively with these tools. A year ago, the idea of a coding agent implementing a complex system in days seemed fanciful. Now it’s routine for those who’ve learned the patterns. The models improve relentlessly, but perhaps more importantly, our understanding of how to use them deepens with each project.

Those shortcomings we blamed on the technology? Turns out they were mostly ours.

The Case for Slower AI

Time to Think

The LLM Council

Implications for Agentic AI

Our Understanding Deepens

More from the blog

The Problem Isn't the Agent. It's the Approach.

How Coding Agents Change the Way We Build Software

A Friday Afternoon Pen Test and a Trillion-Dollar Question

Want to discuss this?