CR LGApr 13, 2025

ControlNET: A Firewall for RAG-based LLM System

Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, Zhan Qin

arXiv:2504.09593v220.718 citationsh-index: 9Has Code

Originality Highly original

AI Analysis

This addresses security risks like data breaches and poisoning for RAG-based LLM deployments in sensitive domains such as healthcare and finance, representing a novel defense mechanism.

The paper tackles security vulnerabilities in Retrieval-Augmented Generation (RAG) systems for LLMs, proposing ControlNET as an AI firewall that detects and mitigates adversarial queries, achieving over 0.909 AUROC in threat detection while preserving system harmlessness.

Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound and outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect adversarial queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments on four different benchmark datasets including Msmarco, HotpotQA, FinQA, and MedicalSys using state-of-the-art open source LLMs (Llama3, Vicuna, and Mistral). Our results demonstrate that ControlNET achieves over 0.909 AUROC in detecting and mitigating security threats while preserving system harmlessness. Overall, ControlNET offers an effective, robust, harmless defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.

View on arXiv PDF

Similar