Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering
For users and providers of remote in-context learning services, this work reveals a new privacy risk and provides both attack and defense methods.
The paper demonstrates that retrieval-augmented in-context learning for document QA is vulnerable to membership inference attacks, proposing two black-box attacks that outperform prior methods, with the second attack achieving higher resilience to paraphrasing using a weighted-averaging scheme. The attacks are mitigated by an ensemble prompting defense.
We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks in many cases with small number of prefixes. We also adapt an existing ensemble prompting defense to our setting, demonstrating that it substantially mitigates the privacy leakage caused by our second attack.