LGCRDCMar 2, 2024

Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach

arXiv:2403.01268v110 citationsh-index: 13USENIX Security Symposium
Originality Incremental advance
AI Analysis

This work addresses privacy vulnerabilities in federated learning for users and organizations, offering a method to enhance protection against data reconstruction attacks, though it builds incrementally on existing information theory approaches.

The paper tackles the problem of data reconstruction attacks in federated learning by proving that constraining transmitted information can throttle such attacks, and proposes algorithms to limit information leakage while maintaining model accuracy, validated with real-world datasets.

Federated Learning (FL) trains a black-box and high-dimensional model among different clients by exchanging parameters instead of direct data sharing, which mitigates the privacy leak incurred by machine learning. However, FL still suffers from membership inference attacks (MIA) or data reconstruction attacks (DRA). In particular, an attacker can extract the information from local datasets by constructing DRA, which cannot be effectively throttled by existing techniques, e.g., Differential Privacy (DP). In this paper, we aim to ensure a strong privacy guarantee for FL under DRA. We prove that reconstruction errors under DRA are constrained by the information acquired by an attacker, which means that constraining the transmitted information can effectively throttle DRA. To quantify the information leakage incurred by FL, we establish a channel model, which depends on the upper bound of joint mutual information between the local dataset and multiple transmitted parameters. Moreover, the channel model indicates that the transmitted information can be constrained through data space operation, which can improve training efficiency and the model accuracy under constrained information. According to the channel model, we propose algorithms to constrain the information transmitted in a single round of local training. With a limited number of training rounds, the algorithms ensure that the total amount of transmitted information is limited. Furthermore, our channel model can be applied to various privacy-enhancing techniques (such as DP) to enhance privacy guarantees against DRA. Extensive experiments with real-world datasets validate the effectiveness of our methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes