CL AIJul 24, 2025

Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection

San Kim, Jonghwi Kim, Yejin Jeon, Gary Geunbae Lee

arXiv:2507.18202v18.33 citationsh-index: 3ACL

Originality Incremental advance

AI Analysis

This addresses a critical security problem for users of RAG systems by preventing harmful outputs from poisoned knowledge bases, though it is an incremental defense method.

The paper tackles the security risk of poisoned documents in Retrieval-Augmented Generation (RAG) pipelines by proposing GMTP, a gradient-based method that detects and filters out adversarially crafted documents, achieving over 90% elimination of poisoned content while retaining relevant documents.

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by providing external knowledge for accurate and up-to-date responses. However, this reliance on external sources exposes a security risk, attackers can inject poisoned documents into the knowledge base to steer the generation process toward harmful or misleading outputs. In this paper, we propose Gradient-based Masked Token Probability (GMTP), a novel defense method to detect and filter out adversarially crafted documents. Specifically, GMTP identifies high-impact tokens by examining gradients of the retriever's similarity function. These key tokens are then masked, and their probabilities are checked via a Masked Language Model (MLM). Since injected tokens typically exhibit markedly low masked-token probabilities, this enables GMTP to easily detect malicious documents and achieve high-precision filtering. Experiments demonstrate that GMTP is able to eliminate over 90% of poisoned content while retaining relevant documents, thus maintaining robust retrieval and generation performance across diverse datasets and adversarial settings.

View on arXiv PDF

Similar