CRCLMay 14

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

arXiv:2605.1511862.0
Predicted impact top 25% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For the LLM security community, this work provides a systematic method to identify coverage gaps in attack benchmarks, highlighting that current evaluations are incomplete and fragmented.

The authors introduce a reusable framework to audit whether LLM attack benchmarks collectively cover the threat surface, revealing that six public benchmarks cover at most 25% of a 4x6 Target x Technique matrix, with entire threat categories lacking standardized evaluation despite published attacks achieving 46x token amplification and 96% attack success rates.

We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy -- 401 data-populated and 106 threat-model-derived leaves -- of inference-time attacks extracted from 932 arXiv security studies (2023--2026). The matrix enables benchmark-external validation -- auditing collective coverage rather than individual benchmark consistency. Applying it to six public benchmarks reveals that the three primary frameworks (HarmBench, InjecAgent, AgentDojo) occupy non-overlapping cells covering at most 25\% of the matrix, while entire STRIDE threat categories (Service Disruption, Model Internals) lack any standardized evaluation, despite published attacks in these categories achieving 46$\times$ token amplification and 96\% attack success rates through mechanisms which no benchmark tests. The corpus of 2,521 unique attack groups further reveals pervasive naming fragmentation (up to 29 surface forms for a single attack) and heavy concentration in Safety \& Alignment Bypass, structural properties invisible at smaller scale. The taxonomy, attack records, and coverage mappings are released as extensible artifacts; as new benchmarks emerge, they can be mapped onto the same matrix, enabling the community to track whether evaluation gaps are closing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes