A Friday Afternoon Pen Test and a Trillion-Dollar Question

At 1:30pm on a Friday afternoon, I started building a penetration testing suite. By 3:30pm it was finished. By 4:00pm it was running its first scans against our production applications. By 5:00pm the issues it surfaced had been fixed and I was on the way to the pub.

That is not a brag. It is an illustration of why a trillion dollars has been wiped off software stocks in the past month.

The Conversation That Started It

Something shifted in November. Agentic coding tools crossed a threshold where they stopped being assistants and started being genuine collaborators. Since that moment, I have personally built and deployed more working software than our entire team would have shipped in a year. Not prototypes. Not demos. Production applications, live and serving customers. I wrote about this shift in January, and I have written about it several times since, because the acceleration has not slowed. If anything it is compounding.

That pace creates a new problem. When you are shipping this fast, your old approach to security does not hold up anymore.

We had always outsourced penetration testing to a third-party firm. They would come in periodically, run their engagement over a few weeks, deliver a branded PDF, and we would work through the findings. It was fine when we were shipping a couple of major releases a year. It is not fine when you are deploying new applications every week.

On Friday lunchtime I had a conversation with a colleague about exactly this. The volume of work we are now producing meant our periodic pen tests were leaving long gaps where new code sat untested. We could not afford to commission a new engagement every time we shipped something. Even if we could, the turnaround time would create a bottleneck that defeated the whole point of moving faster.

So we asked a simple question. Could we build our own continuous pen testing suite?

Two Hours

The answer, it turns out, was yes.

The pipeline we built combines OWASP ZAP for dynamic application security testing and Nuclei for template-driven vulnerability detection, orchestrated through a single Python script that handles the full workflow. Authenticate via AWS Cognito. Spider the application. Run active scans. Execute custom detection templates targeting patterns common in the type of agentic applications we are building. Merge findings, deduplicate, rank by severity, and generate a unified HTML report with remediation guidance.

Two hours from first line of code to working pipeline. Thirty minutes to run the first scans. One hour to review and fix what it found.

A typical commercial pen test costs between five and twenty-five thousand pounds. The timeline is two to four weeks from scoping call to final report, with the actual testing often consuming three to five consultant days. There is a sales cycle before that, scheduling availability, scoping questionnaires, and often a wait of several weeks to get into a tester’s calendar.

We compressed the equivalent of maybe sixty to seventy percent of that coverage into a Friday afternoon, at essentially zero marginal cost per subsequent scan, and we can run it again tomorrow after every deployment.

What It Does and What It Does Not

Let me be honest about the limitations, because they matter for the argument I want to make.

What we built covers a significant portion of what a commercial pen test delivers. It handles known vulnerability scanning, OWASP Top 10 coverage, common misconfiguration detection, CVE checks, and authenticated crawling of application surfaces. For catching regressions and well-understood vulnerability classes on every deployment, it is genuinely excellent.

What it cannot do is think creatively. A professional pen tester will examine your application’s business logic, understanding that a user should not be able to modify another user’s booking by changing an ID in a request, or that a discount code can be applied twice through a specific sequence of API calls. These flaws do not match any scanner template. They require someone who understands what the application is supposed to do and then tries to break those assumptions.

Professional testers also chain vulnerabilities together. They combine a low-severity information disclosure with a medium-severity access control weakness to achieve a critical-impact data breach. Automated scanners report findings in isolation. They do not reason about how weaknesses compound.

And then there is compliance. A CREST-certified report from a named firm carries weight with clients, insurers, and auditors that no automated scan can replicate. If a contract says you need a pen test, it usually means a pen test from an accredited provider.

Those are real gaps. But here is where it gets interesting.

The Agent That Reads Your Code

Anthropic launched Claude Code Security last week. Using Opus 4.6, their team found over 500 high-severity vulnerabilities in production open-source codebases, bugs that had survived decades of expert review and millions of hours of fuzzing. The critical distinction from traditional static analysis is that Claude Code Security does not scan for known patterns. It reads and reasons about code the way a human security researcher would, tracing data flows, understanding component interactions, and catching vulnerabilities that no rule set covers.

Now consider what happens when you combine that kind of source-level reasoning with the DAST pipeline we built on Friday.

An agentic pen testing system with access to your source code can read the codebase and understand the application’s intended behaviour, its authentication model, its data flows, its business rules. It can identify that a particular API endpoint has weak authorisation checks. Then, instead of just flagging it in a report, it can dynamically configure the DAST tools to test that specific endpoint with crafted payloads and confirm whether the vulnerability is actually exploitable at runtime.

That is the workflow of a human pen tester. Read the code. Form hypotheses. Test them. The business logic gap, which has always been the strongest argument for human testers, shrinks dramatically when the agent has source access and can reason about application intent. The chained exploitation problem becomes tractable too, because an agent with full source visibility can identify relationships between vulnerabilities that scanners working in isolation never would.

StackHawk made exactly this point in their analysis of Claude Code Security’s launch. The tool does not run your application. It cannot send requests through your API stack, test how your auth middleware chains together, or confirm whether a finding is actually exploitable in your environment. That is the gap. Our DAST pipeline fills it. The combination closes the loop that neither tool can close alone.

The remaining gaps are real but narrow. Infrastructure and cloud configuration review sits outside the web application layer, though extending the agent’s scope to include Terraform files and AWS configs is entirely feasible. Social engineering is obviously out of scope. Truly novel attack vectors requiring creative lateral thinking that current models have not demonstrated remain a human strength, for now. But for the vast majority of web applications, this combination would deliver coverage that equals or exceeds what most mid-tier pen testing firms actually provide. Not annually. Continuously. On every commit.

The Uncomfortable Truth About Pen Testing

The traditional pen testing model has been built on scarcity of expertise and the manual labour intensity of the work. Firms charge premium day rates because good testers are rare and the work has historically been difficult to automate. That moat is eroding fast.

Here is the part the industry does not love discussing. A significant proportion of paid pen test engagements, particularly at the lower price points, are not much more than what our Friday afternoon pipeline does, wrapped in a branded PDF with a consultant’s name on it. Run Burp Suite and Nessus, tidy up the output, add some contextual commentary, present the findings. That tier of work is becoming very difficult to justify commercially when a technically capable team can replicate it in an afternoon with agentic tooling.

What happens is a classic squeeze. The low-to-mid end of the market gets automated away. Value concentrates at the top, in genuine red team operations, novel exploit development, complex adversary simulation, and the kind of creative lateral thinking that requires deep human expertise. The “run some scanners and write a report” tier is in real trouble.

The compliance angle adds another dimension. Right now, many organisations pay for pen tests primarily because a framework or client contract says they must. If the industry and regulatory bodies start recognising automated continuous testing as an acceptable alternative, or even a superior one given its frequency, that removes another prop from the traditional model.

The Pattern You Have Seen Before

If you have been reading my articles over the past few months, this story should feel familiar.

In February, a trillion dollars was wiped off software stocks. The narrative from some quarters was that this was the AI bubble finally bursting. It was not. As I wrote at the time, the sell-off was not driven by AI disappointing investors. It was driven by AI terrifying them, because it demonstrated, concretely, that it could do the work that justifies billions in software subscription revenue. That is not a bubble bursting. That is disruption happening in real time.

Pen testing is just one example. But it illustrates the pattern perfectly. A category of professional services work that commands premium pricing, built on genuine expertise but delivered through a model that wraps automation in human labour and bills by the day. AI does not eliminate the need for the expertise. It eliminates the need for the labour-intensive delivery model. The value shifts from execution to insight, from running tools to understanding results, from doing the work to knowing what work needs doing.

This same dynamic is playing out across every software category simultaneously. CRM, analytics, security, DevOps, project management. The companies that built their moats on the complexity of their products, on the training required to use them, on the professional services ecosystems that surrounded them, are watching those moats drain. Satya Nadella said it plainly. Business applications are essentially CRUD databases with a bunch of business logic. The business logic is all going to the agents.

The Value Has Not Disappeared

This is the point that gets lost in the stock market panic. AI is not destroying the software market. The need for secure applications has not gone away because we automated pen testing. If anything, the need has grown, because we are shipping so much more software now. The need for analytics has not gone away. The need for customer relationship management has not gone away.

What is shifting, rapidly and violently, is where the value sits. It is moving away from traditional software businesses that sell seats and bill for implementation, and toward those that are embracing AI tools to deliver outcomes at a fraction of the previous cost and time. The total value in the market may well grow. But who captures it is changing at a pace that public markets, built on quarterly forecasts and twelve-month price targets, are struggling to process.

A Friday afternoon pen test is a small story. But it illustrates the whole pattern. The tools are here. The threshold has been crossed. The question for every software business is not whether this shift is coming. It is whether they are the ones doing the eating, or the ones on the menu.

A Friday Afternoon Pen Test and a Trillion-Dollar Question

The Conversation That Started It

Two Hours

What It Does and What It Does Not

The Agent That Reads Your Code

The Uncomfortable Truth About Pen Testing

The Pattern You Have Seen Before

The Value Has Not Disappeared

More from the blog

Security by Obscurity Is Dead

The Problem Isn't the Agent. It's the Approach.

How Business Software Survives the Agentic Era

Want to discuss this?