LGJun 26, 2024

QBI: Quantile-Based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning

Micha V. Nowak, Tim P. Bott, David Khachaturov, Frank Puppe, Adrian Krenzer, Amar Hekalo

arXiv:2406.18745v22.6h-index: 28Has Code

Originality Incremental advance

AI Analysis

This addresses privacy vulnerabilities in federated learning, enabling more efficient attacks on user data, though it is incremental as it builds on prior reconstruction methods.

The paper tackles the problem of private data reconstruction from model updates in federated learning by proposing QBI, a bias initialization method that enhances reconstruction capabilities, achieving gains of up to 50% on ImageNet and 60% on IMDB sentiment analysis datasets.

Federated learning enables the training of machine learning models on distributed data without compromising user privacy, as data remains on personal devices and only model updates, such as gradients, are shared with a central coordinator. However, recent research has shown that the central entity can perfectly reconstruct private data from shared model updates by maliciously initializing the model's parameters. In this paper, we propose QBI, a novel bias initialization method that significantly enhances reconstruction capabilities. This is accomplished by directly solving for bias values yielding sparse activation patterns. Further, we propose PAIRS, an algorithm that builds on QBI. PAIRS can be deployed when a separate dataset from the target domain is available to further increase the percentage of data that can be fully recovered. Measured by the percentage of samples that can be perfectly reconstructed from batches of various sizes, our approach achieves significant improvements over previous methods with gains of up to 50% on ImageNet and up to 60% on the IMDB sentiment analysis text dataset. Furthermore, we establish theoretical limits for attacks leveraging stochastic gradient sparsity, providing a foundation for understanding the fundamental constraints of these attacks. We empirically assess these limits using synthetic datasets. Finally, we propose and evaluate AGGP, a defensive framework designed to prevent gradient sparsity attacks, contributing to the development of more secure and private federated learning systems.

View on arXiv PDF Code

Similar