LGCLCRJun 3, 2024

Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients

arXiv:2406.00999v226 citations
Originality Highly original
AI Analysis

This exposes a novel vulnerability in privacy-preserving training for language models, indicating that current defenses like differential privacy may be insufficient.

The paper tackled the problem of data leakage in distributed machine learning by showing that gradients from even a small fraction of parameters in Transformer models can reconstruct private training data, with experiments revealing vulnerabilities in single layers or components with as few as 0.54% of parameters.

Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes