LGDec 25, 2024

FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

arXiv:2412.18904v111 citationsh-index: 26AAAI
Originality Incremental advance
AI Analysis

This work addresses data imbalance and heterogeneity issues in federated learning, particularly for scenarios affected by Simpson's Paradox, offering a domain-specific improvement over existing methods.

The paper tackled the problem of Simpson's Paradox in federated learning, where data heterogeneity causes global model aggregation to misrepresent data distributions, and proposed FedCFA, which uses counterfactual learning and factor decorrelation to improve model accuracy and efficiency, achieving superior performance on six datasets under limited communication rounds.

Federated learning (FL) is a promising technology for data privacy and distributed optimization, but it suffers from data imbalance and heterogeneity among clients. Existing FL methods try to solve the problems by aligning client with server model or by correcting client model with control variables. These methods excel on IID and general Non-IID data but perform mediocrely in Simpson's Paradox scenarios. Simpson's Paradox refers to the phenomenon that the trend observed on the global dataset disappears or reverses on a subset, which may lead to the fact that global model obtained through aggregation in FL does not accurately reflect the distribution of global data. Thus, we propose FedCFA, a novel FL framework employing counterfactual learning to generate counterfactual samples by replacing local data critical factors with global average data, aligning local data distributions with the global and mitigating Simpson's Paradox effects. In addition, to improve the quality of counterfactual samples, we introduce factor decorrelation (FDC) loss to reduce the correlation among features and thus improve the independence of extracted factors. We conduct extensive experiments on six datasets and verify that our method outperforms other FL methods in terms of efficiency and global model accuracy under limited communication rounds.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes