How Private is Your Attention? Bridging Privacy with In-Context Learning
This work addresses privacy concerns for users of language models in sensitive applications, but it is incremental as it builds on existing ICL and differential privacy frameworks.
The paper tackles the problem of ensuring formal privacy in in-context learning for transformer models by proposing a differentially private pretraining algorithm for linear attention heads, with theoretical analysis showing a privacy-accuracy trade-off and robustness to adversarial perturbations, supported by simulations.
In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms underlying ICL, its feasibility under formal privacy constraints remains largely unexplored. In this paper, we propose a differentially private pretraining algorithm for linear attention heads and present the first theoretical analysis of the privacy-accuracy trade-off for ICL in linear regression. Our results characterize the fundamental tension between optimization and privacy-induced noise, formally capturing behaviors observed in private training via iterative methods. Additionally, we show that our method is robust to adversarial perturbations of training prompts, unlike standard ridge regression. All theoretical findings are supported by extensive simulations across diverse settings.