Author: Duncan Greatwood, CEO, Xage Security
On November 13, 2025, Anthropic released its report, “Disrupting the first reported AI-orchestrated cyber espionage campaign,” marking the first confirmed example of a fully AI driven attack. This is the moment security and government leaders have been warning about for years. For more than a decade, researchers predicted that advanced adversaries would eventually use AI to automate reconnaissance, vulnerability discovery, exploit development and even lateral movement. That future has now arrived, and Anthropic’s disclosure provides the proof.
Anthropic’s disclosure identifies this moment as an inflection point: the point at which AI becomes so capable that it is used just as effectively for offense as it is for defense. AI has changed the tempo of attacks from hours to seconds. It has blurred the line between human intent and machine execution. And it has created a playing field where attackers no longer need large teams of experts to launch sophisticated campaigns. All they need is a model to do the heavy lifting.
Attackers Only Need to Be Right Once, So Defenders Need an Advantage
It is often said that attackers hold the advantage because they only need to be right once, while defenders must be right every time. That has always been true, but defenders do retain one advantage that adversaries can never fully take away: control of their environment.
Today, control is synonymous with one architectural approach in particular: Zero Trust. When implemented correctly, Zero Trust does not rely on hope, heuristics or non-deterministic language analysis. It verifies every entity, human or machine, and treats every action as untrusted and unauthorized until proven otherwise.
AI-on-AI Attacks: Why AI Guardrails Cannot Secure AI Systems
The Anthropic incident also exposes a hard truth that the security community must confront. Many organizations’ own AI implementations have depended on guardrails, filters, rules and prompt moderation to control what their AI systems can and cannot do. These guardrails operate at the natural language layer and attempt to prevent malicious behavior by constraining what the model is allowed to respond to and to output.
But, just as the hackers evaded Claude’s guardrails to enable this attack, t adversaries can always jailbreak guardrails by finding creative ways to encode, chain or translate their instructions. If a model can be convinced through clever prompting that a malicious action is legitimate, the guardrail becomes irrelevant.
Once an AI agent gains access to privileged systems or sensitive data flows, there is nothing inherent in a guardrail that can prevent an attacker from escalating privileges, moving laterally or exfiltrating assets. Guardrails cannot guarantee secure outcomes — they are always open to manipulation.
Fortify Your AI With Jailbreak-Proof Zero Trust
To defend against AI-on-AI attacks that operate at machine speed, organizations must adopt a Zero Trust architecture for AI systems. This architecture enforces strict identity checks, fine grained entitlement controls, least privilege delegation and real time authorization at every step.
Even if an agent produces a malicious instruction, the instruction cannot be executed unless the agent’s identity has the rights to perform that action. No prompt or jailbreak can grant privileges that the identity does not have. Every request, action or downstream workflow triggered by an AI agent is monitored and logged with the same rigor applied to human access.
In this architecture, guardrails become a supplemental control, but no longer have to try to act as the primary form of security protection for AI systems.
The AI Reality: Never Trust, Always Verify
The Anthropic disclosure should serve as a wake up call for every organization. We have entered a phase where attackers are experimenting with agentic automation, multi-step reasoning and rapid orchestration of offensive operations. And for those deploying AI, relying on model level guardrails is insufficient.
The only sustainable approach is to assume that no actor, no model and no session is automatically trusted. Security must shift from trusting the guardrails to verifying every transaction and identity. This is how organizations will stay ahead of adversaries that are now leveraging generative AI at unprecedented scale and speed. It is how defenders regain their advantage. And it is how we build AI enabled systems that are not only powerful but trustworthy and secure.
For a deep dive into Zero Trust for AI, download our whitepaper.
