LG AI CVJul 17, 2023

Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

Yae Jee Cho, Gauri Joshi, Dimitrios Dimitriadis

arXiv:2307.08809v18.811 citationsh-index: 25

Originality Incremental advance

AI Analysis

This addresses the challenge of data labeling costs and heterogeneity in federated learning, offering a practical solution for clients with limited labels, though it is incremental in improving existing semi-supervised FL methods.

The paper tackles the problem of federated learning with limited labeled data by proposing FedLabel, which selectively uses local or global models to pseudo-label unlabeled data and applies consistency regularization, achieving performance gains of 8-24% over semi-supervised baselines and matching fully supervised methods with only 5-20% labeled data.

Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.

View on arXiv PDF

Similar