CRCLLGJul 2, 2024

Towards More Realistic Extraction Attacks: An Adversarial Perspective

arXiv:2407.02596v39 citationsh-index: 17
AI Analysis

This work addresses security vulnerabilities in language models for AI practitioners and researchers, but it is incremental as it builds on existing extraction attack methods by expanding the attack surface.

The paper tackles the problem of data extraction attacks on language models by considering a more realistic adversarial perspective with multifaceted access, and finds that combining multiple attacks doubles extraction risks, persisting even under mitigation strategies like data deduplication.

Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often examines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a considerably larger attack surface due to access to models across various sizes and checkpoints, and repeated prompting. In this paper, we revisit extraction attacks from an adversarial perspective -- with multi-faceted access to the underlying data. We find significant churn in extraction trends, i.e., even unintuitive changes to the prompt, or targeting smaller models and earlier checkpoints, can extract distinct information. By combining multiple attacks, our adversary doubles ($2 \times$) the extraction risks, persisting even under mitigation strategies like data deduplication. We conclude with four case studies, including detecting pre-training data, copyright violations, extracting personally identifiable information, and attacking closed-source models, showing how our more realistic adversary can outperform existing adversaries in the literature.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes