CRCLApr 20

Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection

arXiv:2604.1824858.1h-index: 2Has Code
Predicted impact top 38% in CR · last 90 daysOriginality Incremental advance
AI Analysis

For LLM security practitioners, this work introduces novel detection mechanisms that address known failure modes of existing detectors, though only three techniques are implemented and evaluated.

The paper proposes seven cross-domain techniques for prompt injection detection, implementing three in prompt-shield v0.4.1. The local-alignment detector improves F1 on deepset from 0.033 to 0.378 with zero false positives, and the stylometric detector adds 11.1 percentage points of F1 on an indirect-injection benchmark.

Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete. Regular expressions miss paraphrased attacks. Fine-tuned classifiers are vulnerable to adaptive adversaries: a 2025 NAACL Findings study reported that eight published indirect-injection defenses were bypassed with greater than fifty percent attack success rates under adaptive attacks. This work proposes seven detection techniques that each port a specific mechanism from a discipline outside large-language-model security: forensic linguistics, materials-science fatigue analysis, deception technology from network security, local-sequence alignment from bioinformatics, mechanism design from economics, spectral signal analysis from epidemiology, and taint tracking from compiler theory. Three of the seven techniques are implemented in the prompt-shield v0.4.1 release (Apache 2.0) and evaluated in a four-configuration ablation across six datasets including deepset/prompt-injections, NotInject, LLMail-Inject, AgentHarm, and AgentDojo. The local-alignment detector lifts F1 on deepset from 0.033 to 0.378 with zero additional false positives. The stylometric detector adds 11.1 percentage points of F1 on an indirect-injection benchmark. The fatigue tracker is validated via a probing-campaign integration test. All code, data, and reproduction scripts are released under Apache 2.0.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes