IRAIMay 12, 2025

GRADA: Graph-based Reranking against Adversarial Documents Attack

arXiv:2505.07546v34 citationsh-index: 11EMNLP
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in RAG systems for users relying on LLMs for accurate information retrieval, representing a strong incremental improvement in defense mechanisms.

The paper tackles the problem of adversarial attacks in Retrieval Augmented Generation (RAG) frameworks, where adversarial documents manipulate retrieval, and proposes GRADA, a graph-based reranking method that reduces attack success rates by up to 80% on the Natural Questions dataset while maintaining accuracy.

Retrieval Augmented Generation (RAG) frameworks improve the accuracy of large language models (LLMs) by integrating external knowledge from retrieved documents, thereby overcoming the limitations of models' static intrinsic knowledge. However, these systems are susceptible to adversarial attacks that manipulate the retrieval process by introducing documents that are adversarial yet semantically similar to the query. Notably, while these adversarial documents resemble the query, they exhibit weak similarity to benign documents in the retrieval set. Thus, we propose a simple yet effective Graph-based Reranking against Adversarial Document Attacks (GRADA) framework aiming at preserving retrieval quality while significantly reducing the success of adversaries. Our study evaluates the effectiveness of our approach through experiments conducted on five LLMs: GPT-3.5-Turbo, GPT-4o, Llama3.1-8b, Llama3.1-70b, and Qwen2.5-7b. We use three datasets to assess performance, with results from the Natural Questions dataset demonstrating up to an 80% reduction in attack success rates while maintaining minimal loss in accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes