LG OC STJan 21, 2022

Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees

Marie Maros, Gesualdo Scutari, Ying Sun, Guang Cheng

arXiv:2201.08507v25.84 citations

Originality Incremental advance

AI Analysis

This work addresses efficient and scalable estimation in distributed systems for applications like sensor networks, though it is incremental as it builds on existing gradient-tracking methods with new statistical guarantees.

The paper tackles decentralized sparse linear regression across a network of agents by proposing a distributed projected gradient tracking algorithm, achieving linear convergence to an estimate within the centralized statistical precision of O(s log d/N) under high-dimensional scaling.

We study sparse linear regression over a network of agents, modeled as an undirected graph and no server node. The estimation of the $s$-sparse parameter is formulated as a constrained LASSO problem wherein each agent owns a subset of the $N$ total observations. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm under high-dimensional scaling, allowing the ambient dimension $d$ to grow with (and possibly exceed) the sample size $N$. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions, suitable conditions on the network connectivity and algorithm tuning, the distributed algorithm converges globally at a {\it linear} rate to an estimate that is within the centralized {\it statistical precision} of the model, $O(s\log d/N)$. When $s\log d/N=o(1)$, a condition necessary for statistical consistency, an $\varepsilon$-optimal solution is attained after $\mathcal{O}(κ\log (1/\varepsilon))$ gradient computations and $O (κ/(1-ρ) \log (1/\varepsilon))$ communication rounds, where $κ$ is the restricted condition number of the loss function and $ρ$ measures the network connectivity. The computation cost matches that of the centralized projected gradient algorithm despite having data distributed; whereas the communication rounds reduce as the network connectivity improves. Overall, our study reveals interesting connections between statistical efficiency, network connectivity \& topology, and convergence rate in high dimensions.

View on arXiv PDF

Similar