LG AI CRDec 26, 2024

RAG with Differential Privacy

arXiv:2412.19291v214.217 citationsh-index: 2Has CodeCAI

Originality Synthesis-oriented

AI Analysis

This addresses privacy concerns for users of RAG systems in knowledge extraction from personal data, but it is incremental as it applies an existing privacy technique to a specific domain.

The paper tackled the privacy risks in Retrieval-Augmented Generation (RAG) systems when using external documents, showing that differentially private token generation is a viable solution for private RAG.

Retrieval-Augmented Generation (RAG) has emerged as the dominant technique to provide \emph{Large Language Models} (LLM) with fresh and relevant context, mitigating the risk of hallucinations and improving the overall quality of responses in environments with large and fast moving knowledge bases. However, the integration of external documents into the generation process raises significant privacy concerns. Indeed, when added to a prompt, it is not possible to guarantee a response will not inadvertently expose confidential data, leading to potential breaches of privacy and ethical dilemmas. This paper explores a practical solution to this problem suitable to general knowledge extraction from personal data. It shows \emph{differentially private token generation} is a viable approach to private RAG.

View on arXiv PDF Code

Similar