Skip to main content

In the evolving landscape of AI in cybersecurity, AI agents present both opportunities and unique challenges. It’s crucial to understand the gaps in your cybersecurity program to effectively protect these advanced systems. This post will explore AI agent threats, relevant mitigations, and strategies to focus your AI security efforts.

What are AI Agents?

AI agents are advanced systems that utilize artificial intelligence to perform actions or make decisions, often with some degree of autonomy. They are designed to perceive their environment, process information, and take actions to achieve specific goals.

Understanding AI Agent Threat Modeling

The perspectives on AI risk management and the threat landscape for AI agents vary widely. Some believe existing cybersecurity programs are sufficient, while others express alarm about the inadequacy of current controls in an AI-driven world.

Screenshot of a horizontal timeline graphic with callouts representing stakeholder perspectives on AI threats: one callout states existing cybersecurity programs already cover these threats, a central callout labeled 'We all are here' invites explanation, and a right callout notes that current controls do not work in an AI world; highlighting the need for AI firewall and Guardian Agent adaptations.
To highlight this disparity, we will compare the AI agent threat modeling framework MAESTRO from the Cloud Security Alliance to a modified version of STRIDE.

We want to be clear that the point of this article is NOT to say MAESTRO is not needed. We acknowledge there are gaps with STRIDE which is why we have to modify it for an AI agent use case. We are using the widely-adopted STRIDE framework to illustrate how much of a gap MAESTRO is actually filling. Use what works best for your needs.

What is MAESTRO?

MAESTRO (Multi-Agentic System Threat Model) is a 7-layer reference architecture for agentic AI developed by the Cloud Security Alliance. This framework is designed to provide a structured approach to threat modeling for AI agents, addressing the limitations of traditional methods like STRIDE.

Diagram showing the 7 Layers Reference Architecture for Agentic AI: a blue circle labeled "7 Layers" connects to seven horizontal boxes representing Foundation Model, Data Operations, Agent Framework, Deployment and Observability, Evaluation and Robustness, Security and Governance, and Agent Ecosystem, illustrating layered AI firewall and Guardian Agent architecture.
MAESTRO Threat Modeling Framework Specifically for AI Agents from Cloud Security Alliance

Why Does MAESTRO Exist?

Current threat modeling frameworks, like STRIDE, are not well-suited for AI agents. STRIDE, while a good starting point, doesn’t fully address the unique challenges posed by AI agents, such as adversarial attacks and risks associated with unpredictable learning and decision-making. MAESTRO provides a more tailored approach to threat modeling in this context.

A Modified STRIDE + ML

To better understand the gaps that AI creates in a cybersecurity program, let’s modify STRIDE so it can handle the unique challenges of AI agents. This is done by incorporating two new threat categories “Misunderstanding” and “Lack of Accountability” (ML), which can be employed.

  • Misunderstanding: This refers to models having undesirable assessments due to a lack of context or malicious intervention, leading to unexpected emerging behaviors.
  • Lack of Accountability: This occurs when actions are performed without clear governance or ownership, making it difficult to determine responsibility when issues arise.

Applying STRIDE + ML to AI Agents

AI risk frameworks such as the OWASP Multi Agentic Threat Modeling Guide and the Cloud Security Alliance’ MAESTRO framework can be mapped into STRIDE + ML to provide a clearer view of AI agent threats. This mapping reveals that a significant portion of AI agent threats can be categorized using traditional STRIDE, but a notable percentage require the additional ML categories.

Two donut charts comparing threat modeling frameworks for AI agents from the Cloud Security Alliance and OWASP: the left chart shows an agentic threat modeling distribution with segments for spoofing, tampering, repudiation, information disclosure, denial of service, elevation of privilege, misunderstanding, and other risks, with STRIDE accounting for 76% and ML for 24%; the right chart shows a multi‑agent threat modeling guide with similar categories, indicating 68% STRIDE and 32% ML, highlighting considerations for AI firewall and Guardian Agent security.

AI Agent Threats and Mitigations

AI agent threats can be categorized, and mitigations can be mapped to these categories. Some threats can be mitigated using existing cybersecurity measures, while others require extending capabilities or implementing new mitigations.

We use the OWASP AI Agent threat taxonomy as it is more concise compared to the Cloud Security Alliance taxonomy. Almost all of the threats in the Cloud Security Alliance threat taxonomy can be categorized into the OWASP taxonomy.

User interface screenshot dividing AI agent threats into three categories—threats already mitigated, threats where capabilities can be extended, and threats unique to AI agents—under headings 'Utilize your Existing Mitigations' and 'Explore options for Mitigations', to guide AI firewall and Guardian Agent defense planning.
  • Threats with Existing Mitigations:
    • Spoofing (T9 – Identity Spoofing)
    • Repudiation (T8 – Repudiation and Untraceability)
    • Information Disclosure (T12 – Agent Communication Poisoning)
    • Denial of Service (T4 – Resource Overload)
    • Elevation of Privilege (T3 – Privilege Compromise)
  • Threats Requiring Expanded Mitigations: This category includes
    • Spoofing (T13 – Rogue Agents)
    • Tampering (T11 – Unexpected RCE and Code Attacks, T1 – Memory Poisoning)
    • Denial of Service (T10 – Overwhelming HITL)
    • Elevation of Privilege (T14 – Human attacks on MAS)
  • Threats Requiring New Mitigations: These are unique to AI agents
    • Misunderstanding (T2 – Tool Misuse, T5 – Cascading Hallucinations, T6 – Intent Breaking & Goal Manipulation)
    • Lack of Accountability (T7 – Misaligned & Deceptive Behaviour, T15 – Human Trust Manipulation)

Mitigation Options and AI-Specific Considerations

The Cloud Security Alliance provides a good list of mitigations to focus on. Many of the mitigations should be part of any robust cybersecurity program regardless if AI is used or not. Below are the additional mitigations required specifically for AI systems.

Agentic AI Safety Mitigations
Mitigation Description Example
Adversarial Training Train agents to be robust against adversarial examples. During the model training process, add examples of prompts trying to get toxic responses about ageism. These should be labeled so the model knows that this type of prompt should not be answered.
Formal Verification Use formal methods to verify agent behavior and ensure goal alignment. Given the intent of an agent is only to provide information and analysis about a customer’s bank account, regularly audit that the agent is not attempting unexpected activity like transferring funds.
Explainable AI (XAI) Improve transparency in agent decision-making to facilitate auditing. Be able to explain why an insurance claim agent denied a specific customer’s claim.
Red Teaming Simulate attacks to identify vulnerabilities. Research the latest prompt injection techniques and test whether they are successful on your system.
Safety Monitoring Implement runtime monitoring to detect unsafe agent behaviors. With a platform independent of the agent, verify that incoming prompts are not jailbreaking attempts or efforts to make the agent perform unethical actions such as illegal or discriminatory behavior.