CRAIDec 14, 2025

Detecting Prompt Injection Attacks Against Application Using Classifiers

arXiv:2512.12583v1
Originality Synthesis-oriented
AI Analysis

This work addresses security vulnerabilities in critical systems and web applications using LLMs, but it is incremental as it applies existing methods to a new dataset.

The paper tackles the problem of detecting prompt injection attacks in LLM-integrated web applications by curating and augmenting a dataset from the HackAPrompt Playground Submissions corpus and training classifiers like LSTM, feed forward neural networks, Random Forest, and Naive Bayes, resulting in improved detection and mitigation to protect applications and systems.

Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes