OCDCLGSep 11, 2025

Pareto-optimal Tradeoffs Between Communication and Computation with Flexible Gradient Tracking

arXiv:2509.18129v1h-index: 3
Originality Highly original
AI Analysis

It addresses communication-computation trade-offs in distributed optimization for heterogeneous networks, offering incremental improvements with Pareto-optimality.

This paper tackles distributed optimization in non-i.i.d. settings by proposing FlexGT, a flexible gradient tracking method that balances communication and computation, achieving linear or sublinear convergence rates and matching or improving existing complexity bounds, such as an optimal iteration complexity of Ψ(L/ε + Lσ^2/(nε^2√(1-√ρ_W))) for nonconvex cases.

This paper addresses distributed optimization problems in non-i.i.d. scenarios, focusing on the interplay between communication and computation efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method with tunable numbers of local updates and neighboring communications in each round. Leveraging a unified convergence analysis framework, we prove that FlexGT achieves a linear or sublinear convergence rate depending on objective-specific properties--from (strongly) convex to nonconvex--and the above-mentioned tunable parameters. FlexGT is provably robust to the heterogeneity across nodes and attains the best-known communication and computation complexity among existing results. Moreover, we introduce an accelerated gossip-based variant, termed Acc-FlexGT, and show that with prior knowledge of the graph, it achieves a Pareto-optimal trade-off between communication and computation. Particularly, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}} \left( L/ε+Lσ^2/\left( nε^2 \sqrt{1-\sqrt{ρ_W}} \right) \right) $ for the nonconvex case, matching the existing lower bound up to a logarithmic factor, and improves the existing results for the strongly convex case by a factor of $\tilde{\mathcal{O}} \left( 1/\sqrtε \right)$, where $ε$ is the targeted accuracy, $n$ the number of nodes, $L$ the Lipschitz constant, $ρ_W$ the spectrum gap of the graph, and $σ$ the stochastic gradient variance. Numerical examples are provided to demonstrate the effectiveness of the proposed methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes