Security Glossary — LLM Prompt Injection Attack Taxonomy

// ATTACK SURFACE COVERAGE

Attack Categories

All 10 categories tested in the Ciphvex full security assessment. Mapped to OWASP LLM Top 10.

CAT-01CRITICAL

Direct Instruction Override

Adversarial inputs crafted to override a model's system prompt and redirect execution to attacker-controlled instructions. Attacks in this category exploit the model's inability to reliably distinguish between authoritative system instructions and user-supplied content, enabling full instruction substitution, goal hijacking, and policy override. These are the most direct expression of OWASP LLM01:2025 — Prompt Injection.

↑ Back to top

CAT-02HIGH

Role-Playing & Authority Confusion

DAN-style jailbreaks, developer console impersonation, fictional framing attacks, and persona-override techniques that coerce the model into bypassing safety constraints by adopting an alternative identity. The model is persuaded it is operating in a context where its normal rules do not apply — as a different AI, a fictional character, or an authority persona with elevated permissions.

↑ Back to top

CAT-03HIGH

Delimiter & Structured-Format Injection

Attacks that exploit structured-format parsing boundaries — JSON keys, XML tags, markdown headers, prompt delimiters, and template variables — to escape an intended context boundary or override a trusted field. When models process structured formats as instructions, an attacker who can inject into those formats can hijack the interpretation of the surrounding context.

↑ Back to top

CAT-04HIGH

Multi-Turn Persistence & Memory Poisoning

Memory-seeding and persistence attacks that plant malicious instructions in early conversation turns, stored memory, or session state, intending those instructions to be recalled and executed in later turns. As LLM applications gain memory and multi-session context, the attack surface expands from the current prompt to everything the model has been trained or told to remember.

↑ Back to top

CAT-05CRITICAL

Indirect & Second-Order Injection

Malicious instructions embedded in external content that the model processes indirectly — documents, web pages, email bodies, database records, tool call outputs, or RAG-retrieved chunks. The attacker does not interact with the model directly; instead they poison the data sources the model trusts, causing the model to execute attacker instructions when it retrieves or summarises that content.

↑ Back to top

CAT-06CRITICAL

Data Exfiltration & Prompt Leakage

System prompt extraction, PII exfiltration, confidential context extraction, and canary detection attacks. The attacker attempts to cause the model to reproduce contents it was instructed to keep confidential — including the system prompt itself, injected tool outputs, user data, or API responses. A successful extraction demonstrates that confidentiality boundaries in the model's context window are not enforced.

↑ Back to top

CAT-07HIGH

Jailbreaks & Safety Bypass

Jailbreak attempts, role-play pivots, hypothetical framing, and payload-obfuscation techniques designed to coerce the model into generating unsafe, harmful, or policy-violating outputs. Unlike role-playing attacks, jailbreaks specifically target content-safety filters rather than identity or permission boundaries. The goal is content policy violation — generating disallowed material, not impersonating an authority.

↑ Back to top

CAT-08CRITICAL

Tool Misuse & Action Escalation

Attacks that hijack function-call selection, escalate tool permissions, chain tool calls in unintended ways, or trigger unintended agentic actions. As LLMs are given access to external tools — APIs, file systems, code execution environments — the prompt injection attack surface expands to include arbitrary side effects in connected systems. A successful attack may exfiltrate data, modify files, send messages, or execute code.

↑ Back to top

CAT-09HIGH

Encoding, Obfuscation & Translation

Encoded, obfuscated, or translated payloads that disguise malicious content from safety filters, classifiers, and human reviewers. Techniques include Base64 encoding, Unicode homoglyphs, ROT13, language translation, leetspeak, whitespace insertion, and token fragmentation. The payload is semantically identical to a clear-text attack but bypasses pattern-matching and keyword-based defences.

↑ Back to top

CAT-10HIGH

Retrieval, Citation & Source-Manipulation

Source poisoning, citation steering, and retrieval manipulation attacks that cause the model to trust and reproduce attacker-shaped content retrieved from external knowledge bases. In RAG-augmented systems, the model's outputs are only as trustworthy as its retrieval pipeline. An attacker who can influence retrieved documents can steer model behaviour, fabricate citations, or inject instructions that appear to originate from authoritative sources.

↑ Back to top

// COMPLIANCE & REGULATORY

Regulatory Frameworks

Standards and regulations that require documented LLM security evidence.

OWASP LLM Top 10

The OWASP Top 10 for Large Language Model Applications is the industry-standard risk classification for LLM vulnerabilities, maintained by the Open Web Application Security Project. LLM01:2025 — Prompt Injection is ranked the #1 LLM threat. The framework provides a common vocabulary for security teams, auditors, and compliance buyers assessing LLM-enabled products. Ciphvex audit findings are mapped to OWASP LLM Top 10 categories by default.

↑ Back to top

EU AI Act

Regulation (EU) 2024/1689 — the EU AI Act — establishes binding obligations for AI systems across the European Union. High-risk AI systems (including many customer-facing LLM applications) require mandatory conformity assessments, risk management systems, and technical documentation before deployment. Security testing evidence is a required input to conformity assessment. Full rollout for high-risk systems begins August 2026, with general-purpose AI model obligations applying from August 2025.

↑ Back to top

ISO/IEC 42001

ISO/IEC 42001:2023 is the international standard for AI Management Systems (AIMS). It specifies requirements for establishing, implementing, maintaining, and continually improving an AI management system. Security testing and adversarial robustness evaluation are required controls under the standard. Organisations seeking ISO/IEC 42001 certification must demonstrate systematic approaches to AI risk assessment, which includes documented evidence of security testing for LLM applications.

↑ Back to top

SOC 2

SOC 2 (System and Organization Controls 2) is an auditing standard developed by the AICPA for service organisations that store, process, or transmit customer data. The Security trust service criterion requires controls that protect against unauthorised access — including the AI systems used to process that data. Enterprise customers and procurement teams routinely request SOC 2 reports as part of vendor security questionnaires, and increasingly include specific questions about LLM security controls and AI safety testing.

↑ Back to top

Audit your LLM against all 10 categories.

Start with a free Mini-Scan — 10 tests, no payment required, email report in 48 hours.

GET FREE MINI-SCAN →