LGDCOCAug 16, 2023

DFedADMM: Dual Constraints Controlled Model Inconsistency for Decentralized Federated Learning

arXiv:2308.08290v18 citationsh-index: 35
Originality Incremental advance
AI Analysis

This work improves DFL for distributed machine learning systems by enhancing model performance and efficiency, though it is incremental as it builds on existing ADMM and SAM techniques.

The paper tackles communication burden and performance issues in decentralized federated learning (DFL) by proposing DFedADMM and DFedADMM-SAM algorithms, which address local inconsistency and heterogeneous overfitting, achieving superior generalization and convergence speed on datasets like MNIST and CIFAR10/100 compared to state-of-the-art methods.

To address the communication burden issues associated with federated learning (FL), decentralized federated learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which have not been fundamentally addressed by existing DFL methods. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to enhance the performance of DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of $\small \mathcal{O}\Big(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-ψ)^2}\Big)$ and $\small \mathcal{O}\Big(\frac{1}{\sqrt{KT}}+\frac{1}{KT(1-ψ)^2}+ \frac{1}{T^{3/2}K^{1/2}}\Big)$ in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where $1 - ψ$ represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10 and CIFAR100 datesets demonstrate that our algorithms exhibit superior performance in terms of both generalization and convergence speed compared to existing state-of-the-art (SOTA) optimizers in DFL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes