There’s a lot of excitement around the new wave of coding agents, and for good reason. The first...
The Case for Slower AI
LLMs come in for plenty of criticism: excessive verbosity, over-hedging, sycophancy, formulaic bullet-point structures, confident hallucination, losing context in long conversations, corporate blandness, repetitive phrasing, and ignoring instructions. But think about it. Consider the following conversation using Claude Opus 4.5 via the Anthropic Console (so we’re directly querying the model rather than via the chat architecture stack):
ME: In less than 30 words, explain general relativity
CLAUDE: Massive objects curve the fabric of spacetime around them. Other objects follow this curvature, which we perceive as gravity. Time also passes slower in stronger gravitational fields.
Granted this is an easy question for an LLM to answer. But because we directly queried the model, we know that this was its direct output, generated in a fraction of a second. How would the average person perform, given the same question and instructed to answer immediately, giving their instant, first response. It’s hard to imagine anyone coming close to the quality of the response we had from Claude. To answer this question properly, we’d expect time to think. Time to write a first draft, count the words, revise it. Given this opportunity, a good many people could craft a similar, maybe even better response.
It turns out that, like us, LLMs can give better answers when they have time to think, to consider their work, and to rewrite where they see fit. Its one of the core features of agentic AI, giving language models the tools to gather and consider information to help formulate a response. Spend any time with coding agents, and the importance of planning before writing code becomes very apparent. My own learned approach to getting the best out of coding agents illustrates this perfectly:
- Manually write an outline plan of what you want to build, making sure to clearly define key goals, architectural decisions, and technology choices.
- Ask the coding agent to build a detailed, phased implementation plan, writing to a markdown file.
- Start a new chat session (with an empty context), and ask the coding agent to read, understand, and critique the plan. You can optionally ask the agent to suggest 3 improvements we could make, although many coding agents seem to have this built into their prompts.
- Read through the critique and suggestions, and ask the agent to update the markdown file plan with those points you agree with.
- Repeat steps 3 and 4 until you are happy with the plan. I will usually repeat these two steps at least 3 times, but often many more.
- Ask the agent to implement each phase of the plan. After each phase, ask the agent to stop, think and consider the work it has done, then apply any improvements.
With this sort of approach, and modern models like Gemini Pro or Opus 4.5, we have successfully implemented large scale engineering tasks in just a few days.
So clearly, many of the shortcomings we initially experienced with LLMs were due to us expecting them to work in ways we would never expect a person to. Indeed, when afforded the same opportunity to work as we would, given time to think, consider, critique their own work, and correct themselves, those shortcomings fade. Under these conditions, the models reveal just how capable they are.
But if giving one model time to think improves results, what happens when you add more minds to the process? Andrej Karpathy's LLM Council illustrates this concept perfectly. The idea is elegantly simple. Instead of querying one LLM, you assemble a council of them. GPT, Gemini, Claude, Grok, whatever combination you prefer. Each model receives your question independently and produces its response. Then comes the interesting part. Each model reviews the anonymised responses from the others, ranking them for accuracy and insight. Finally, a designated "chairman" model synthesises everything into a single consolidated answer.
It's peer review for AI. The same process that underpins scientific publication, academic assessment, and professional quality control, applied to language model outputs.
The anonymisation is a clever touch. Without knowing which response came from which model, the reviewers can't play favourites. They must evaluate purely on merit. This mirrors double-blind peer review in academia, where reviewers don't know the author's identity and vice versa.
Karpathy describes the project as a weekend hack, built while exploring different LLMs for reading books together. He's released it as-is, explicitly stating he won't maintain or improve it. "Code is ephemeral now and libraries are over," he writes. "Ask your LLM to change it in whatever way you like." A statement that rather proves the point about how capable these tools have become.
The repository has attracted nearly ten thousand stars in short order. Clearly the idea resonates. And why wouldn't it? We've long understood that diverse perspectives improve decision-making. Multiple reviewers catch errors that individuals miss. Ensemble methods outperform single classifiers in machine learning. The same logic applies here.
The implications for agentic AI workflows are tantalising. Imagine coding agents that don't just plan and critique their own work, but consult a council of models at key decision points. Technology choices reviewed by multiple perspectives before implementation begins. Code critiqued by different models, each catching different classes of error. Documentation assessed for clarity by minds trained on different corpora. The ensemble approach could slot naturally into the kind of phased, reflective workflows that already produce the best results.
At inmydata, we're exploring these patterns in our work on expert systems for demand prediction and buyer assistance, where the cost of a wrong answer is measured in real money. Having multiple models cross-check forecasts and recommendations before they reach a buyer feels less like a luxury and more like due diligence.
We are, I suspect, only beginning to grasp how to work effectively with these tools. A year ago, the idea of a coding agent implementing a complex system in days seemed fanciful. Now it's routine for those who've learned the patterns. The models improve relentlessly, but perhaps more importantly, our understanding of how to use them deepens with each project. Those shortcomings we blamed on the technology? Turns out they were mostly ours.