LG AI DCJun 6, 2022

Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning

Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu

arXiv:2206.02465v226.6105 citationsh-index: 28Has Code

Originality Incremental advance

AI Analysis

This addresses data heterogeneity in federated learning, offering a novel approach that is incremental in its method but effective for improving model training across distributed clients.

The paper tackles the problem of client drift in federated learning caused by data heterogeneity by proposing virtual homogeneity learning (VHL), which uses a virtual homogeneous dataset to calibrate features, resulting in drastically improved convergence speed and generalization performance.

In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly "rectify" the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.

View on arXiv PDF Code

Similar