AI Agent Attack-Success-Rate: A CVSS for Your Risk Register

A security risk register on screen with an AI agent entry scored like a CVSS, an attack-success-rate gauge beside a firewall CVE row

For years, the risk an AI agent carried was a paragraph of hand-waving. On 28 May 2026 that changed: Anthropic's Claude Opus 4.8 system card reported that a browser-using agent was hijacked by a prompt-injection attack 31.5% of the time with no safeguards, and 0.5% with them. That is an attack-success-rate, and it does for an AI agent what CVSS does for a firewall CVE. It turns a vague worry into a score you can put on a risk register, assign an owner, and treat. If you run a regulated network, that is the development that should change how you govern agents this quarter.

Across enterprise and KRITIS-regulated estates, the pattern never varies: the risks that get managed are the ones with a number next to them. A firewall CVE lands with a CVSS, so it enters triage, gets a patch SLA, and shows up in the audit trail. An AI agent wired into the same network arrived with nothing, so it sat off the register entirely. Anthropic just removed that excuse. The agent is now a measurable, privileged asset, and the discipline you already apply to firewall change is the discipline it needs.

What changed: AI agent risk got a number

The headline figure is the 31.5% browser-use attack-success-rate without safeguards, but the useful part is the spread. With safeguards enabled, the same measurement fell to 0.5%, a reduction of more than sixtyfold. In a coding tool-use setting against an adaptive attacker, Anthropic reported 7.03% without safeguards and 2.09% with. According to the Anthropic Opus 4.8 system card, these were measured against held-out injection attacks across defined agent environments, and the card even publishes a red-team metric where the new model scored worse than its predecessor. That willingness to print a regression is exactly what makes the number trustworthy enough to govern against.

The reason this matters to a network owner is not the specific value. It is that an attack-success-rate is now a thing that exists, the same way a CVSS score exists. You can compare two agent platforms on it, set a threshold in procurement, and demand the figure from any vendor that wants to sit inside your perimeter. Prompt injection has been the number-one entry on the OWASP Top 10 for LLM applications since 2023. What was missing was a way to quantify your exposure to it. Now there is one.

Why an AI agent belongs on your firewall risk register

An AI agent with tools is a privileged asset. It reads untrusted input, holds credentials, and can act on systems, which is precisely the profile of a firewall management path or an automation account. You already track those on your risk register because they are reachable, powerful, and capable of causing damage when subverted. An agent that can browse, call APIs, or run code meets the same test. Leaving it off the register is not a smaller decision than leaving a management interface off it.

The threat model is the one covered before in the context of AI agent security threat modelling: the agent is manipulated through the content it processes rather than the code it runs. A recent unauthenticated PAN-OS root RCE taught the firewall world the same lesson from the other direction this spring, that an unauthenticated input can become full control, and the response to it was a change-management problem, not a coding one. An agent hijack is the same shape of event: an external input crossing into privileged action. It belongs in the same register, scored the same way.

CVSS for a CVE, ASR for an agent: the mapping

The analogy is not loose. An attack-success-rate slots into the risk-management workflow at exactly the points a CVSS score does. The table below is the mapping to put in front of a risk committee.

Risk-register dimension	Firewall CVE (CVSS)	AI agent (attack-success-rate)
What the number scores	Exploitability and impact of a vulnerability	Share of injection attempts that hijack the agent
Where it comes from	Vendor advisory, NVD	Vendor system card, or your own eval harness
Why it changes	Reassessed when exploitation is observed	Shifts with safeguards, tools, and threat model
The treatment it triggers	Patch SLA, compensating controls	Safeguards, least-privilege tools, human approval gate
Residual after mitigation	Unpatched exposure you accept or isolate	The 0.5% that still gets through, scored by blast radius
What an auditor wants	Evidence the patch was authorised and applied	Evidence the agent was assessed, treated, and re-checked

The last row is the one regulated operators underestimate. An auditor does not accept "we use a leading model" as a control any more than they accept "we run a leading firewall" as patch evidence. They want the assessment, the treatment, and the proof it was reviewed. An attack-success-rate gives you the first column of that record.

How to put an agent on the register

Treating an agent as a register entry is a five-step exercise, and none of it is novel if you already run firewall change properly.

First, identify every agent that can act, and write down its blast radius: what it can read, what it can call, what it can change. That reach, not the model's benchmark, is your real risk. Second, attach a number: use the vendor's published attack-success-rate where one exists, and build your own injection eval where it does not. Third, assign an owner, the same way every firewall rule should have one. Fourth, define the treatment: scope the agent's tools to the task, gate irreversible actions behind human approval, and put default-deny egress around the runtime so a successful hijack cannot exfiltrate. Fifth, set a re-assessment cadence, because the number moves with every model and prompt change.

The fourth step is where the firewall discipline transfers most directly. An agent that can act on your network without a recorded, baseline-driven change process is an unauthorised change waiting to happen, and the controls that catch policy drift on a firewall, a known-good baseline and a tamper-evident log, are the same controls that tell you when an agent did something it should not have. The broader practice is the one in the firewall change management guide: nothing privileged acts without authorisation, a recorded baseline, and a tested way back.

NIS2 and ISO 27001 already require this

If you are under NIS2, this is not optional housekeeping. Article 21 requires risk-management measures appropriate to the risks an entity faces, and an AI agent with access to network systems is now plainly one of those risks. You cannot assess what you have not measured, which is why a published attack-success-rate is more than a marketing line: it is the input your risk assessment was missing. The evidence obligations covered in NIS2 firewall evidence in 2026 apply to agent risk in the same way, that the assessment and its treatment must be demonstrable, not asserted.

ISO 27001 frames it through its risk-assessment and risk-treatment clauses: identify the asset, assess the risk, decide the treatment, and keep the records. An agent is an asset by any reading of that standard. The same checklist mindset behind an ISO 27001 firewall audit extends cleanly to agents: list them, score them, treat them, evidence them. The German BSI and the wider NIS2 framework will not grant an exception because the asset happens to run a language model. If anything, a novel, powerful, externally-influenced asset draws more scrutiny, not less, which is the angle taken to AI in the CISO guide to AI threat detection.

Make the number a procurement requirement

The firewall world learned long ago not to buy a box from a vendor who will not publish advisories. Apply the same rule to agents. Ask any agent or AI platform vendor for an attack-success-rate under prompt injection, with the harness and safeguards stated. If they cannot give you one, they have not measured it, and a risk you cannot score never reaches your register. Anthropic set the precedent by publishing; your job is to make that disclosure the floor, not the exception, the same way a CVSS and a security advisory became table stakes for a firewall vendor.

Inside your own estate, the discipline scales the same way it does for a multi-vendor firewall fleet. You do not need a different governance model for each agent any more than you need one per firewall vendor, a point that holds across multi-vendor firewall management: one register, one scoring method, one evidence trail, applied consistently.

Build the discipline before you need it

The uncomfortable part is the same as it is with any CVE. The work that protects you is work you have to do before the incident, not during it. You cannot inventory your agents, score them, and define their treatment in the hour an agent gets hijacked, any more than you can build a firewall inventory during an active root RCE. The teams that will handle the first serious agent compromise well are the ones putting agents on the register now, while it is a calm afternoon's work rather than a forensic reconstruction.

This is the discipline FwChange is built around, and it is vendor-agnostic by design. The next privileged asset on your network will be an AI agent, and the questions a regulator asks about it, where is it, what can it do, how was the risk treated, and can you prove it, are the questions a change-management practice answers by default. If you want to know whether your current process would stand up to that, our free NIS2 Readiness Check walks through exactly those questions before an auditor or an incident forces them on you. As a distribution note, you can set fwchange.com as a Preferred Source in your Google preferences to keep these analyses surfacing in AI Overviews.

About FwChange

FwChange is a Firewall change management methodology

Full Bio →FwChange Methodology