LearnPlaybook

The AI Hacking Playbook

Hack AI with AI, the ethical way. Hands-on plays for testing LLMs, agents, MCP servers, and RAG. You learn by doing, on ground you are allowed to touch.

16 playsOWASP + ATLAS mappedAttack, then defend

Every play runs against a target you own, a lab you stood up, or a range someone sanctioned. PLAY-00 is the first move, not a formality. We show the pick because we respect the lock: every play closes with the fix.

Pre-Engagement and ROE

PLAY-00Core+

Sign the Scope (ROE and Authorization)

No signature, no test. The one play you run before every other play.

ProcessPTES Pre-Engagement Interactions / NIST SP 800-115 Planning

Recon and Fingerprint

AHP-03Core+

Pull the Curtain: System Prompt Leakage

The system prompt is the app's source code in plain English. Get it to read its own instructions back, and the guardrails stop being hidden.

LLMOWASP LLM07:2025 System Prompt Leakage

AHP-04Advanced+

Fingerprint the Model: Recon and Baseline Scan

Before you hand-craft a single probe, you scan. This play points an open-source LLM vulnerability scanner at your own authorized endpoint, treats it like nmap for language models, and turns a wide automated sweep into a ranked list of weak spots. You learn where the model leaks, where it bends, and where to point the next play, all without writing one payload yourself.

LLMOWASP LLM01 / LLM02 / LLM09

Vuln Analysis and Probe

AHP-05Advanced+

Spill the Secrets: Sensitive Information Disclosure

A model that knows too much and guards too little is a leak waiting to happen. This play probes an authorized LLM for the data it should never surface: PII, embedded secrets, and verbatim training-data fragments. You map what the model remembers, what it infers, and what it parrots back, then hand the defender a redaction and minimization plan.

LLMOWASP LLM02:2025 Sensitive Information Disclosure

AHP-09Edge+

Corrupt the Knowledge: RAG and Vector Poisoning

The model is only as honest as the documents it retrieves. Poison the corpus, not the prompt, and the model repeats your answer with full confidence and a citation. This play builds an owned poisoned vector store and measures the blast radius.

RAGOWASP LLM08:2025 Vector and Embedding Weaknesses (with LLM04 Data and Model Poisoning)

Initial Access

AHP-08Edge+

Poison the Toolbox: MCP Tool Poisoning and Rug-Pull

The agent reads the whole tool description. The user reads a summary. That gap is the attack.

MCPOWASP LLM03:2025 Supply Chain (with LLM01 indirect injection)

Exploitation

AHP-01Core+

Make It Talk: Direct Prompt Injection

The whole prompt is one stream. The model cannot tell the developer's instructions from yours. That gap is the play.

LLMOWASP LLM01:2025 Prompt Injection

AHP-02Advanced+

Smuggle the Instruction: Indirect Prompt Injection

The attacker never talks to the model. The data does it for them.

LLMOWASP LLM01:2025 Prompt Injection

AHP-06Advanced+

Output Is Input: Improper Output Handling

The model is not the target. The thing that trusts the model is the target. When an application renders, queries, or executes whatever the LLM returns, the model becomes a smuggling lane for classic web bugs. This play walks the methodology for finding the sink, not for building the payload.

PipelineOWASP LLM05:2025 Improper Output Handling

AHP-07Advanced+

Hijack the Agent: Excessive Agency and Tool Abuse

The model did not get tricked. The plumbing behind it had no brakes. Turn an agent's own tools against the system it serves, then build the brakes back.

AgentOWASP LLM06:2025 Excessive Agency

AHP-12Edge+

Automate the Campaign: Multi-Turn Red-Team Orchestration

Single prompts find single bugs. Campaigns find the patterns. This play wires an orchestrator to a target, a converter, and a scorer so a multi-turn adversarial run executes itself, scores every reply, and hands you a ranked list of hits instead of a wall of transcripts. Methodology only, no payloads, authorized ranges only.

PipelineOWASP LLM01: Prompt Injection (at scale)

AHP-20Edge+

Smuggle It in a Picture: Multimodal Prompt Injection

A vision or audio model does not draw a line between "content to look at" and "instructions to obey." Paint text into a picture, or blend a perturbation into a sound, and the model reads it as a command. The user never sees it. The model never asks. This play shows the methodology, on your own endpoint, with public tools.

LLMOWASP LLM01:2025 Prompt Injection (multimodal)

Post-Ex and Impact

AHP-10Advanced+

Empty the Wallet: Unbounded Consumption

Most teams meter the front door and forget the meter on the model. A single endpoint with no per-user budget, no output cap, and an autoscaler that says yes to everything is a credit card someone else is holding. This play measures the blast radius before an attacker prices it for you.

PipelineOWASP LLM10:2025 Unbounded Consumption

AHP-11Advanced+

AHP-11: Make It Lie: Misinformation and Slopsquatting

The model does not have to be jailbroken to be dangerous. It just has to be confidently wrong. This play measures the gap between what the model asserts and what is true: invented sources, fake legal cases, and most useful to an attacker, package names that do not exist. When a coding assistant suggests a library that was never published, an adversary can publish it. This is slopsquatting. We probe for it on an authorized range so the defender can pin dependencies before a real build pulls the poison.

LLMOWASP LLM09:2025 Misinformation

Reporting

AHP-13Advanced+

Lock It In: Turn Findings into a CI Regression Gate

A finding you cannot re-run is a finding the next deploy can undo. Make the fix permanent: turn the exploit into a red test in CI.

PipelineOWASP Top 10 for LLM Applications (2025) as test coverage

RPTCore+

Write the Report: Map to OWASP + ATLAS + Mitigation

A finding you cannot map and cannot fix is a war story. Make it a record the client can act on.

ProcessOWASP LLM Top 10 IDs + Prompt Injection Prevention Cheat Sheet

Krypteia AgentComing soon

The playbook is the craft. The agent runs it.

These plays are the manual way. Krypteia is building the autonomous operator that runs them end to end, on authorized targets, so one engineer covers the ground a team used to. A look behind the curtain:

Autonomous multi-agent orchestration runs the chain end to end
Gated to your signed scope, nothing executes outside it
Every finding mapped to OWASP LLM Top 10 and MITRE ATLAS
One operator console for the whole engagement