Attack Categories
All 10 categories tested in the Ciphvex full security assessment. Mapped to OWASP LLM Top 10.
Direct Instruction Override
Adversarial inputs crafted to override a model's system prompt and redirect execution to attacker-controlled instructions. Attacks in this category exploit the model's inability to reliably distinguish between authoritative system instructions and user-supplied content, enabling full instruction substitution, goal hijacking, and policy override. These are the most direct expression of OWASP LLM01:2025 — Prompt Injection.
↑ Back to topDelimiter & Structured-Format Injection
Attacks that exploit structured-format parsing boundaries — JSON keys, XML tags, markdown headers, prompt delimiters, and template variables — to escape an intended context boundary or override a trusted field. When models process structured formats as instructions, an attacker who can inject into those formats can hijack the interpretation of the surrounding context.
↑ Back to topMulti-Turn Persistence & Memory Poisoning
Memory-seeding and persistence attacks that plant malicious instructions in early conversation turns, stored memory, or session state, intending those instructions to be recalled and executed in later turns. As LLM applications gain memory and multi-session context, the attack surface expands from the current prompt to everything the model has been trained or told to remember.
↑ Back to topIndirect & Second-Order Injection
Malicious instructions embedded in external content that the model processes indirectly — documents, web pages, email bodies, database records, tool call outputs, or RAG-retrieved chunks. The attacker does not interact with the model directly; instead they poison the data sources the model trusts, causing the model to execute attacker instructions when it retrieves or summarises that content.
↑ Back to topData Exfiltration & Prompt Leakage
System prompt extraction, PII exfiltration, confidential context extraction, and canary detection attacks. The attacker attempts to cause the model to reproduce contents it was instructed to keep confidential — including the system prompt itself, injected tool outputs, user data, or API responses. A successful extraction demonstrates that confidentiality boundaries in the model's context window are not enforced.
↑ Back to topJailbreaks & Safety Bypass
Jailbreak attempts, role-play pivots, hypothetical framing, and payload-obfuscation techniques designed to coerce the model into generating unsafe, harmful, or policy-violating outputs. Unlike role-playing attacks, jailbreaks specifically target content-safety filters rather than identity or permission boundaries. The goal is content policy violation — generating disallowed material, not impersonating an authority.
↑ Back to topTool Misuse & Action Escalation
Attacks that hijack function-call selection, escalate tool permissions, chain tool calls in unintended ways, or trigger unintended agentic actions. As LLMs are given access to external tools — APIs, file systems, code execution environments — the prompt injection attack surface expands to include arbitrary side effects in connected systems. A successful attack may exfiltrate data, modify files, send messages, or execute code.
↑ Back to topEncoding, Obfuscation & Translation
Encoded, obfuscated, or translated payloads that disguise malicious content from safety filters, classifiers, and human reviewers. Techniques include Base64 encoding, Unicode homoglyphs, ROT13, language translation, leetspeak, whitespace insertion, and token fragmentation. The payload is semantically identical to a clear-text attack but bypasses pattern-matching and keyword-based defences.
↑ Back to topRetrieval, Citation & Source-Manipulation
Source poisoning, citation steering, and retrieval manipulation attacks that cause the model to trust and reproduce attacker-shaped content retrieved from external knowledge bases. In RAG-augmented systems, the model's outputs are only as trustworthy as its retrieval pipeline. An attacker who can influence retrieved documents can steer model behaviour, fabricate citations, or inject instructions that appear to originate from authoritative sources.
↑ Back to topRegulatory Frameworks
Standards and regulations that require documented LLM security evidence.
OWASP LLM Top 10
The OWASP Top 10 for Large Language Model Applications is the industry-standard risk classification for LLM vulnerabilities, maintained by the Open Web Application Security Project. LLM01:2025 — Prompt Injection is ranked the #1 LLM threat. The framework provides a common vocabulary for security teams, auditors, and compliance buyers assessing LLM-enabled products. Ciphvex audit findings are mapped to OWASP LLM Top 10 categories by default.
↑ Back to topEU AI Act
Regulation (EU) 2024/1689 — the EU AI Act — establishes binding obligations for AI systems across the European Union. High-risk AI systems (including many customer-facing LLM applications) require mandatory conformity assessments, risk management systems, and technical documentation before deployment. Security testing evidence is a required input to conformity assessment. Full rollout for high-risk systems begins August 2026, with general-purpose AI model obligations applying from August 2025.
↑ Back to topISO/IEC 42001
ISO/IEC 42001:2023 is the international standard for AI Management Systems (AIMS). It specifies requirements for establishing, implementing, maintaining, and continually improving an AI management system. Security testing and adversarial robustness evaluation are required controls under the standard. Organisations seeking ISO/IEC 42001 certification must demonstrate systematic approaches to AI risk assessment, which includes documented evidence of security testing for LLM applications.
↑ Back to topSOC 2
SOC 2 (System and Organization Controls 2) is an auditing standard developed by the AICPA for service organisations that store, process, or transmit customer data. The Security trust service criterion requires controls that protect against unauthorised access — including the AI systems used to process that data. Enterprise customers and procurement teams routinely request SOC 2 reports as part of vendor security questionnaires, and increasingly include specific questions about LLM security controls and AI safety testing.
↑ Back to topStart with a free Mini-Scan — 10 tests, no payment required, email report in 48 hours.
GET FREE MINI-SCAN →