LGOct 9, 2023

Little is Enough: Boosting Privacy by Sharing Only Hard Labels in Federated Semi-Supervised Learning

arXiv:2310.05696v46 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses privacy concerns in distributed sensitive data applications, offering a method that is incremental by building on federated learning with a novel sharing mechanism.

The paper tackles the problem of privacy leakage in federated learning by proposing a federated co-training approach that shares only hard labels on a public dataset, which improves privacy while maintaining model quality and enables the use of non-gradient-based models like decision trees and random forests.

In many critical applications, sensitive data is inherently distributed and cannot be centralized due to privacy concerns. A wide range of federated learning approaches have been proposed to train models locally at each client without sharing their sensitive data, typically by exchanging model parameters, or probabilistic predictions (soft labels) on a public dataset or a combination of both. However, these methods still disclose private information and restrict local models to those that can be trained using gradient-based methods. We propose a federated co-training (FedCT) approach that improves privacy by sharing only definitive (hard) labels on a public unlabeled dataset. Clients use a consensus of these shared labels as pseudo-labels for local training. This federated co-training approach empirically enhances privacy without compromising model quality. In addition, it allows the use of local models that are not suitable for parameter aggregation in traditional federated learning, such as gradient-boosted decision trees, rule ensembles, and random forests. Furthermore, we observe that FedCT performs effectively in federated fine-tuning of large language models, where its pseudo-labeling mechanism is particularly beneficial. Empirical evaluations and theoretical analyses suggest its applicability across a range of federated learning scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes