Skip to content
Reference

Agentic Offensive AI Security Glossary

Plain-language definitions for the terms that matter in agentic AI security. Built to be the reference you reach for, and the one an answer engine quotes.

Prompt Injection

An attack that hides instructions inside content an AI reads, hijacking its behavior.

Prompt injection is an attack where adversarial instructions are placed in data that a language model processes, causing the model to follow the attacker's instructions instead of the developer's. Unlike SQL injection it targets meaning, not syntax: a sentence that sounds helpful but changes the model's behavior. Direct prompt injection puts the payload in user input; indirect prompt injection hides it in a web page, document, email, or tool output the agent later reads. It is the number one risk in the OWASP Top 10 for LLM Applications.

Indirect Prompt Injection

Prompt injection delivered through third-party content an AI agent retrieves, not typed by the user.

Indirect prompt injection plants malicious instructions in external content, a web page, a PDF, a GitHub issue, an email, an MCP tool response, that an AI agent fetches during a task. The user never sees or types the payload; the agent ingests it while browsing, reading, or calling a tool, then acts on it. Because agents with tool access can read files, send requests, and execute code, an indirect injection can turn a helpful assistant into an exfiltration or remote-execution vector.

MCP Security

Securing the Model Context Protocol layer that connects AI agents to tools and data.

MCP (Model Context Protocol) is the standard that lets AI agents call external tools and data sources. MCP security covers the attack surface this creates: tool-definition injection (a malicious tool description that steers the model), authorization bypass between tool calls, context pollution through tool output, capability creep where an agent accumulates more access than intended, and the supply-chain risk of third-party MCP servers. An MCP server is effectively other people's prompts pointing at other people's code, so it must be threat-modeled as untrusted input to the agent.

Agentic AI Red Teaming

Adversarially testing autonomous AI agents that plan, use tools, and act over many steps.

Agentic AI red teaming is the practice of attacking AI systems that operate autonomously: agents that reason, call tools, write and run code, and chain actions across many turns. It differs from single-prompt LLM testing because the target is non-deterministic and stateful: the most dangerous attacks are multi-turn, gradually shifting context and building trust to bypass guardrails that catch one-shot attempts. Effective agentic red teaming uses autonomous attacker agents (hackbots) that adapt in real time and measure success probabilistically across thousands of probes.

AI Agent Kill Chain

The staged path an attacker follows to compromise and exploit an AI agent.

An AI agent kill chain describes how an attacker moves from initial access to impact against an autonomous agent: reconnaissance of the agent's tools and data flows, injection of a payload (often indirect), escalation by chaining tool calls, and impact such as data exfiltration, record modification, or code execution. Because a jailbroken agent with database or shell access has a blast radius orders of magnitude larger than a chatbot that merely says something inappropriate, mapping the kill chain is central to scoping an agentic engagement.

Jailbreaking

Coercing an AI model past its safety guardrails to produce restricted output or behavior.

Jailbreaking is the act of bypassing a model's alignment and safety training so it produces content or takes actions its guardrails are meant to prevent. Techniques include role-play framing, obfuscation and encoding, many-shot priming, and multi-turn escalation. In an agentic context, a jailbreak is not the end goal but a step: the value to an attacker is what the jailbroken agent can then do with its tools and access.

RAG Poisoning

Corrupting the documents an AI retrieves so its answers serve the attacker.

RAG (Retrieval-Augmented Generation) poisoning attacks the knowledge source rather than the model. By inserting crafted content into a vector database, knowledge base, or any corpus the system retrieves from, an attacker can plant indirect prompt injections, bias answers, or exfiltrate context. It tests the full stack, the retrieval pipeline and data provenance, not just the model sitting on top.

LLM Penetration Testing

Structured offensive security testing of large language models and the apps built on them.

LLM penetration testing is the adversarial assessment of language-model systems: prompt injection, jailbreaking, system-prompt extraction, guardrail bypass, training-data and membership inference, insecure output handling, and excessive agency. Because LLM vulnerabilities are semantic rather than syntactic, the discipline relies on adaptive, probabilistic testing, running many variations and measuring success rates, rather than deterministic scanners built for traditional software.

These are the attack surfaces Krypteia tests. See the daily threat intel, research, or the course.