A Systemic Evaluation of Multimodal RAG Privacy
This addresses privacy concerns for users and organizations deploying mRAG systems, though it is incremental as it highlights known risks without proposing new solutions.
The paper tackles privacy risks in multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision tasks by empirically analyzing how standard prompting can leak private information like image inclusion and metadata from datasets.
The growing adoption of multimodal Retrieval-Augmented Generation (mRAG) pipelines for vision-centric tasks (e.g. visual QA) introduces important privacy challenges. In particular, while mRAG provides a practical capability to connect private datasets to improve model performance, it risks the leakage of private information from these datasets during inference. In this paper, we perform an empirical study to analyze the privacy risks inherent in the mRAG pipeline observed through standard model prompting. Specifically, we implement a case study that attempts to infer the inclusion of a visual asset, e.g. image, in the mRAG, and if present leak the metadata, e.g. caption, related to it. Our findings highlight the need for privacy-preserving mechanisms and motivate future research on mRAG privacy.