CRAIOct 17, 2025

SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

arXiv:2510.15476v22 citationsh-index: 6Has Code
Originality Highly original
AI Analysis

This work addresses the critical problem of inconsistent security evaluations in LLMs for researchers and developers, providing a foundational framework to unify and advance the field.

The paper tackles the fragmented research on security risks in large language models, particularly jailbreak prompts, by proposing a taxonomy, formalizing threat models, and introducing an open-source toolkit and dataset for standardized evaluation, resulting in the creation of JAILBREAKDB, the largest annotated dataset of jailbreak prompts.

Large Language Models (LLMs) have rapidly become integral to real-world applications, powering services across diverse sectors. However, their widespread deployment has exposed critical security risks, particularly through jailbreak prompts that can bypass model alignment and induce harmful outputs. Despite intense research into both attack and defense techniques, the field remains fragmented: definitions, threat models, and evaluation criteria vary widely, impeding systematic progress and fair comparison. In this Systematization of Knowledge (SoK), we address these challenges by (1) proposing a holistic, multi-level taxonomy that organizes attacks, defenses, and vulnerabilities in LLM prompt security; (2) formalizing threat models and cost assumptions into machine-readable profiles for reproducible evaluation; (3) introducing an open-source evaluation toolkit for standardized, auditable comparison of attacks and defenses; (4) releasing JAILBREAKDB, the largest annotated dataset of jailbreak and benign prompts to date;\footnote{The dataset is released at \href{https://huggingface.co/datasets/youbin2014/JailbreakDB}{\textcolor{purple}{https://huggingface.co/datasets/youbin2014/JailbreakDB}}.} and (5) presenting a comprehensive evaluation platform and leaderboard of state-of-the-art methods \footnote{will be released soon.}. Our work unifies fragmented research, provides rigorous foundations for future studies, and supports the development of robust, trustworthy LLMs suitable for high-stakes deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes