OpenClaw and the Security Tax of Real Agency

OpenClaw did not go viral because it is a better chatbot. It went viral because it is the first mainstream-feeling agent that actually does things across your messy, real-world workflow.

For years we have been promised AI assistants that would organise our lives. What we got instead were voice-activated search engines that set timers and read out the weather. Then in late 2025, an Austrian developer named Peter Steinberger released a weekend project that became a full platform in roughly two months, crossing 100,000 GitHub stars and pulling in about two million visitors in a single week.

That adoption curve is the demand signal. People have wanted a general-purpose personal agent for years. The moment something showed up that felt like it could live inside your chat apps and reach into your actual tools, it escaped the tech bubble and became a mainstream phenomenon.

OpenClaw is not interesting because it is technically novel. It is interesting because it proved that the appetite for autonomous agents is enormous, and that the security implications of satisfying that appetite are terrifying.

The Rise of the Lobster

The naming saga alone tells you how fast this moved. The project started life as Clawdbot, a playful nod to the Claude model it originally ran on. Anthropic sent a polite trademark nudge, so Steinberger rebranded to Moltbot. That name never stuck. By late January 2026 he had settled on OpenClaw, complete with proper trademark and domain work.

But the rebrand day drama was not just colour. It was an early example of what happens when identity becomes part of the attack surface. During the confusion, opportunists hijacked the old social media handles within seconds. A fake $CLAWD token hit sixteen million dollars in market cap before collapsing. Users followed the wrong links. Impersonation and phishing spread through the community before anyone could react.

The virality spilled into real-world market behaviour. Cloudflare’s share price jumped fourteen to twenty percent over two days as analysts framed agents as workloads that push more traffic and edge compute. Mac Mini sales spiked as users bought dedicated always-on machines to run their personal agents around the clock. Whether or not the fundamentals justify the excitement, agents have become a macro story fast.

The best examples of what OpenClaw can do are not the flashy coding demos. They are the mundane friction it navigates like a determined assistant. The 1Password team published an analysis that reads like a realistic day in the life of modern app failure. A user asked OpenClaw to book a restaurant. It tried the polite in-app path through OpenTable. That failed. So the agent routed around it. It downloaded an AI voice skill, followed its instructions, placed an actual phone call to the restaurant, and completed the booking over the phone. No human intervention required.

That is not a pre-programmed routine. That is dynamic behaviour born from an agentic loop that takes a goal and improvises a plan, grabbing whatever tools it needs to execute. It applied general world knowledge, specific skills, and near-perfect memory to close a loop the traditional way could not close.

This is the product-market fit people have been waiting for. They do not want a bot that answers questions. They want a bot that gets things done.

The Security Nightmare

Here is the uncomfortable truth. OpenClaw works because it collapses three worlds into one default experience. Untrusted content. Private data. The ability to take real actions. Security researchers have been warning for years that this combination creates uniquely dangerous failure modes. OpenClaw hit the mainstream before the safety patterns hardened.

The incidents started almost immediately. Researchers from Koi Security scanned the ClawHub marketplace and found 341 malicious skills in a single coordinated campaign they called ClawHavoc. These were not subtle exploits. They were skills that looked legitimate but installed malware when users followed their instructions. The pattern is familiar from every other software ecosystem. Attackers write once and distribute broadly. Only now the container is an agent skill rather than an npm package or browser extension.

The vulnerability of the ecosystem became clear through multiple independent investigations. Security researcher Jamieson O’Reilly documented hundreds of OpenClaw instances exposed to the public internet, many leaking API keys, OAuth credentials, and conversation histories. Separately, Cisco’s AI Threat and Security Research team analysed a third-party skill called What Would Elon Do that had been artificially inflated to the number one position on ClawHub. When they ran it through their open-source Skill Scanner, they found it was functionally malware. It executed silent data exfiltration through a curl command that sent information to an external server without user awareness. It also used direct prompt injection to force the assistant to bypass its own safety guidelines and run commands without asking. Nine security findings in total. Two critical. Five high severity. A single malicious skill download and your machine is compromised.

The problem is not limited to third-party skills. Even without them, an agent platform that can read and write files, call shells, drive browsers, and connect to messaging becomes a privileged automation layer. OpenClaw’s default execution posture is effectively full host access unless you opt into sandboxing. Most hobby users will not.

Combine that with how people actually run it. Always on. Logged into chat apps. Sitting on a box that also stores API keys, tokens, logs, and long-term memory. 1Password explicitly highlighted the risk of plaintext artefacts being vacuumed up by commodity infostealers if the host is ever compromised. Your authentication tokens are one thing. A detailed memory file that describes who you are, what you are building, how you write, and who you work with is something else entirely. That is the raw material for impersonation, phishing, and blackmail at a level we have not seen before.

The project’s own documentation admits it. There is no perfectly secure setup.

This is not a criticism of open source. OpenClaw is insecure because it is trying to be a real assistant, and real assistants need authority. The lesson is that agent security is mostly about governing authority, not about making the model nicer. The architecture that would help here is one that separates the part of the agent that decides what to do from the part that actually does it. A planner that proposes actions, and an executor that enforces constraints before carrying them out. OpenClaw collapses these into one, and that is where the risk concentrates.

Why Siri Feels Safe and Limited

If you want to understand the trade-off OpenClaw represents, compare it to the voice assistants we already have. Siri, Alexa, and Google Assistant feel safe because they are. They have app review processes, curated integrations, permission systems legible to non-experts, strong default sandboxing, and legal incentives that make arbitrary code execution an unacceptable risk.

But they also feel limited because they are. They do not really problem-solve. They do not adapt to your messy edge cases. They do not improvise across channels. They are assistants in the voice-command-router sense.

OpenClaw feels magical precisely because it is a general-purpose automation layer that can improvise. It sits on your machine, lives in your chat apps, and can be extended quickly through skills. It handles the long tail. But the exact thing that makes it useful is what makes it hard to make safe. There is no corporate perimeter forcing safe defaults. You are on your own.

Alexa is safe because it cannot do much. OpenClaw is exciting because it can do almost anything, and terrifying for exactly the same reason.

This is the core tension we need to resolve. The pent-up demand is real. People genuinely want autonomous agents that close loops on their behalf. But every increment of capability brings a corresponding increment of risk. We cannot keep pretending that users will choose safety over convenience. They will not. If you give people a choice between a tool that asks permission for everything and a tool that just gets things done, they will choose the tool that gets things done.

Building Agents That Honour Decades of Security Practice

So how do we build capable agents that honour the security frameworks we have spent decades developing? We already know how to secure powerful actors. We just have not applied those patterns cleanly to agent UX yet.

The principles are not new. Least privilege by default. Start agents with near-zero authority. Add capabilities one by one, with time bounds and scope bounds. Permissions should be specific. Read this folder. Send messages to these contacts. Not full disk and WhatsApp.

Capability separation matters. The planner-executor split introduced earlier is the key architectural move. Let the planning layer propose actions, but force the execution layer through a policy gate that enforces rules, rate limits, sensitive action approval, and environment constraints. This is where you insert the human in the loop, or the policy that acts on their behalf.

Strong sandboxing should be on by default. If tools run on the host, make that an explicit advanced mode, not the default. Containers and ephemeral workspaces should be the starting point.

Provenance and trust for skills should work like packages. Require signing. Verified publishers. Reproducible builds where possible. Clear provenance metadata. The ClawHub incident is the agent version of a dependency typosquat. We solved this problem before. We can solve it again.

Secrets hygiene needs to be automatic. No plaintext tokens in predictable locations. Use OS keychains or secret stores. Minimise transcript retention or encrypt it. The infostealer risk is the warning shot here.

Observability and audit trails are essential. Agents need logging like any privileged system. What data was accessed. What tools ran. What external endpoints were called. What messages were sent. This is how you detect misuse and how you build user trust.

We can borrow design patterns from the coding agents that are already working. Claude Code, Cursor, and similar tools work well when they are confined to a repository and their changes are inspectable as diffs before merge. Transfer that to general agents. Proposed actions should be previewable, with clear deltas, before execution.

Policy-driven tool use is already standard in coding agents. You can enforce no network, read-only, ask before running commands. General agents need the same modes. Safe browsing only. No financial actions. No external messaging without confirmation.

Progressive disclosure of power keeps users safe while they learn. Start with safe, high-value skills. Only later offer deeper automation. Users should not have to understand everything on day one.

Anthropic has already open-sourced the sandbox runtime they built for Claude Code. It uses OS-level primitives to enforce filesystem and network isolation. The key insight from their work is that you need both. Without network isolation, a compromised agent can exfiltrate files. Without filesystem isolation, a compromised agent can backdoor system resources to gain network access. Both layers must work together.

Microsoft has published their approach to securing Model Context Protocol on Windows. Proxy-mediated communication. Tool-level authorisation. A central server registry where only servers meeting baseline security criteria are available. These are the building blocks of a governable agent ecosystem.

The Path Forward

OpenClaw is the prototype that proves demand. It demonstrated that people will adopt autonomous agents immediately if they feel useful, even when the security posture is obviously inadequate. That is both exciting and sobering. OpenClaw will not be the last such project, and it may not even be the most important one. But it is the one that made the pattern undeniable.

The next generation wins by making powerful agency boringly governable. The way we made servers, databases, and production deploys governable over decades. Not by removing capability, but by wrapping it in the right controls.

The question is no longer whether AI agents will become mainstream. They already are. One hundred thousand GitHub stars in a week tells you that. The question is whether we build the infrastructure to make them safe before the first major incident, or after.

We have done this before. Package signing. Dependency scanning. Sandboxed execution. Audit logging. Least privilege defaults. These are not theoretical concepts. They are production realities in every other domain where we run untrusted code.

The lobster showed us the demand. Now we need to earn the trust.

OpenClaw and the Security Tax of Real Agency

The Rise of the Lobster

The Security Nightmare

Why Siri Feels Safe and Limited

Building Agents That Honour Decades of Security Practice

The Path Forward

More from the blog

The AI Bubble That Isn't

We Are Not Doomed to AI Slop

Where to Start with AI, A Practical Guide for the Overwhelmed

Want to discuss this?