CL AIJun 5, 2025

Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering

arXiv:2506.06384v16.76 citationsh-index: 9KSEM

Originality Incremental advance

AI Analysis

This addresses a security threat for users of large language models, but it is incremental as it builds on existing detection methods by combining approaches.

The paper tackles prompt injection attacks on LLMs by proposing DMPI-PMHFE, a dual-channel detection framework that integrates a pre-trained model with heuristic feature engineering, achieving improved accuracy, recall, and F1-score on benchmark datasets and reducing attack success rates across multiple LLMs.

With the widespread adoption of Large Language Models (LLMs), prompt injection attacks have emerged as a significant security threat. Existing defense mechanisms often face critical trade-offs between effectiveness and generalizability. This highlights the urgent need for efficient prompt injection detection methods that are applicable across a wide range of LLMs. To address this challenge, we propose DMPI-PMHFE, a dual-channel feature fusion detection framework. It integrates a pretrained language model with heuristic feature engineering to detect prompt injection attacks. Specifically, the framework employs DeBERTa-v3-base as a feature extractor to transform input text into semantic vectors enriched with contextual information. In parallel, we design heuristic rules based on known attack patterns to extract explicit structural features commonly observed in attacks. Features from both channels are subsequently fused and passed through a fully connected neural network to produce the final prediction. This dual-channel approach mitigates the limitations of relying only on DeBERTa to extract features. Experimental results on diverse benchmark datasets demonstrate that DMPI-PMHFE outperforms existing methods in terms of accuracy, recall, and F1-score. Furthermore, when deployed actually, it significantly reduces attack success rates across mainstream LLMs, including GLM-4, LLaMA 3, Qwen 2.5, and GPT-4o.

View on arXiv PDF

Similar