César A. Uribe

OC
h-index12
31papers
615citations
Novelty50%
AI Score49

31 Papers

OCOct 10, 2022
On the Performance of Gradient Tracking with Local Updates

Edward Duc Hien Nguyen, Sulaiman A. Alghunaim, Kun Yuan et al.

We study the decentralized optimization problem where a network of $n$ agents seeks to minimize the average of a set of heterogeneous non-convex cost functions distributedly. State-of-the-art decentralized algorithms like Exact Diffusion~(ED) and Gradient Tracking~(GT) involve communicating every iteration. However, communication is expensive, resource intensive, and slow. In this work, we analyze a locally updated GT method (LU-GT), where agents perform local recursions before interacting with their neighbors. While local updates have been shown to reduce communication overhead in practice, their theoretical influence has not been fully characterized. We show LU-GT has the same communication complexity as the Federated Learning setting but allows arbitrary network topologies. In addition, we prove that the number of local updates does not degrade the quality of the solution achieved by LU-GT. Numerical examples reveal that local updates can lower communication costs in certain regimes (e.g., well-connected graphs).

LGOct 3, 2022
Unbounded Gradients in Federated Learning with Buffered Asynchronous Aggregation

Mohammad Taha Toghani, César A. Uribe

Synchronous updates may compromise the efficiency of cross-device federated learning once the number of active clients increases. The \textit{FedBuff} algorithm (Nguyen et al., 2022) alleviates this problem by allowing asynchronous updates (staleness), which enhances the scalability of training while preserving privacy via secure aggregation. We revisit the \textit{FedBuff} algorithm for asynchronous federated learning and extend the existing analysis by removing the boundedness assumptions from the gradient norm. This paper presents a theoretical analysis of the convergence rate of this algorithm when heterogeneity in data, batch size, and delay are considered.

LGJun 19, 2023
Adaptive Federated Learning with Auto-Tuned Clients

Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe et al.

Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challenges, especially in the hyperparameter tuning on the client side. We propose $Δ$-SGD, a simple step size rule for SGD that enables each client to use its own step size by adapting to the local smoothness of the function each client is optimizing. We provide theoretical and empirical results where the benefit of the client adaptivity is shown in various FL scenarios.

LGOct 3, 2022
PersA-FL: Personalized Asynchronous Federated Learning

Mohammad Taha Toghani, Soomin Lee, César A. Uribe

We study the personalized federated learning problem under asynchronous updates. In this problem, each client seeks to obtain a personalized model that simultaneously outperforms local and global models. We consider two optimization-based frameworks for personalization: (i) Model-Agnostic Meta-Learning (MAML) and (ii) Moreau Envelope (ME). MAML involves learning a joint model adapted for each client through fine-tuning, whereas ME requires a bi-level optimization problem with implicit gradients to enforce personalization via regularized losses. We focus on improving the scalability of personalized federated learning by removing the synchronous communication assumption. Moreover, we extend the studied function class by removing boundedness assumptions on the gradient norm. Our main technical contribution is a unified proof for asynchronous federated learning with bounded staleness that we apply to MAML and ME personalization frameworks. For the smooth and non-convex functions class, we show the convergence of our method to a first-order stationary point. We illustrate the performance of our method and its tolerance to staleness through experiments for classification tasks over heterogeneous datasets.

LGMar 14, 2022
The Role of Local Steps in Local SGD

Tiancheng Qin, S. Rasoul Etesami, César A. Uribe

We consider the distributed stochastic optimization problem where $n$ agents want to minimize a global function given by the sum of agents' local functions, and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. data sets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a central node to improve their local optimization tasks. We analyze the effect of local steps on the convergence rate and the communication complexity of Local SGD. In particular, instead of assuming a fixed number of local steps across all communication rounds, we allow the number of local steps during the $i$-th communication round, $H_i$, to be different and arbitrary numbers. Our main contribution is to characterize the convergence rate of Local SGD as a function of $\{H_i\}_{i=1}^R$ under various settings of strongly convex, convex, and nonconvex local functions, where $R$ is the total number of communication rounds. Based on this characterization, we provide sufficient conditions on the sequence $\{H_i\}_{i=1}^R$ such that Local SGD can achieve linear speed-up with respect to the number of workers. Furthermore, we propose a new communication strategy with increasing local steps superior to existing communication strategies for strongly convex local functions. On the other hand, for convex and nonconvex local functions, we argue that fixed local steps are the best communication strategy for Local SGD and recover state-of-the-art convergence rate results. Finally, we justify our theoretical results through extensive numerical experiments.

QUANT-PHMar 22, 2022
Local Stochastic Factored Gradient Descent for Distributed Quantum State Tomography

Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe et al.

We propose a distributed Quantum State Tomography (QST) protocol, named Local Stochastic Factored Gradient Descent (Local SFGD), to learn the low-rank factor of a density matrix over a set of local machines. QST is the canonical procedure to characterize the state of a quantum system, which we formulate as a stochastic nonconvex smooth optimization problem. Physically, the estimation of a low-rank density matrix helps characterizing the amount of noise introduced by quantum computation. Theoretically, we prove the local convergence of Local SFGD for a general class of restricted strongly convex/smooth loss functions, i.e., Local SFGD converges locally to a small neighborhood of the global optimum at a linear rate with a constant step size, while it locally converges exactly at a sub-linear rate with diminishing step sizes. With a proper initialization, local convergence results imply global convergence. We validate our theoretical findings with numerical simulations of QST on the Greenberger-Horne-Zeilinger (GHZ) state.

OCApr 18, 2022
On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks

Mohammad Taha Toghani, César A. Uribe

We study the decentralized consensus and stochastic optimization problems with compressed communications over static directed graphs. We propose an iterative gradient-based algorithm that compresses messages according to a desired compression ratio. The proposed method provably reduces the communication overhead on the network at every communication round. Contrary to existing literature, we allow for arbitrary compression ratios in the communicated messages. We show a linear convergence rate for the proposed method on the consensus problem. Moreover, we provide explicit convergence rates for decentralized stochastic optimization problems on smooth functions that are either (i) strongly convex, (ii) convex, or (iii) non-convex. Finally, we provide numerical experiments to illustrate convergence under arbitrary compression ratios and the communication efficiency of our algorithm.

OCApr 16, 2022
On Acceleration of Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

Ekaterina Trimbach, Edward Duc Hien Nguyen, César A. Uribe

We study the acceleration of the Local Polynomial Interpolation-based Gradient Descent method (LPI-GD) recently proposed for the approximate solution of empirical risk minimization problems (ERM). We focus on loss functions that are strongly convex and smooth with condition number $σ$. We additionally assume the loss function is $η$-Hölder continuous with respect to the data. The oracle complexity of LPI-GD is $\tilde{O}\left(σm^d \log(1/\varepsilon)\right)$ for a desired accuracy $\varepsilon$, where $d$ is the dimension of the parameter space, and $m$ is the cardinality of an approximation grid. The factor $m^d$ can be shown to scale as $O((1/\varepsilon)^{d/2η})$. LPI-GD has been shown to have better oracle complexity than gradient descent (GD) and stochastic gradient descent (SGD) for certain parameter regimes. We propose two accelerated methods for the ERM problem based on LPI-GD and show an oracle complexity of $\tilde{O}\left(\sqrtσ m^d \log(1/\varepsilon)\right)$. Moreover, we provide the first empirical study on local polynomial interpolation-based gradient methods and corroborate that LPI-GD has better performance than GD and SGD in some scenarios, and the proposed methods achieve acceleration.

LGNov 27, 2023
Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations

Delaram Pirhayatifard, Mohammad Taha Toghani, Guha Balakrishnan et al.

In this work, we address the challenge of multi-task image generation with limited data for denoising diffusion probabilistic models (DDPM), a class of generative models that produce high-quality images by reversing a noisy diffusion process. We propose a novel method, SR-DDPM, that leverages representation-based techniques from few-shot learning to effectively learn from fewer samples across different tasks. Our method consists of a core meta architecture with shared parameters, i.e., task-specific layers with exclusive parameters. By exploiting the similarity between diverse data distributions, our method can scale to multiple tasks without compromising the image quality. We evaluate our method on standard image datasets and show that it outperforms both unconditional and conditional DDPM in terms of FID and SSIM metrics.

5.4MLApr 20
Sparse Network Inference under Imperfect Detection and its Application to Ecological Networks

Aoran Zhang, Tianyao Wei, Maria J. Guerrero et al.

Recovering latent structure from count data has received considerable attention in network inference, particularly when one seeks both cross-group interactions and within-group similarity patterns in bipartite networks, which is widely used in ecology research. Such networks are often sparse and inherently imperfect in their detection. Existing models mainly focus on interaction recovery, while the induced similarity graphs are much less studied. Moreover, sparsity is often not controlled, and scale is unbalanced, leading to oversparse or poorly rescaled estimates with degrading structural recovery. To address these issues, we propose a framework for structured sparse nonnegative low-rank factorization with detection probability estimation. We impose nonconvex $\ell_{1/2}$ regularization on the latent similarity and connectivity structures to promote sparsity within-group similarity and cross-group connectivity with better relative scale. The resulting optimization problem is nonconvex and nonsmooth. To solve it, we develop an ADMM-based algorithm with adaptive penalization and scale-aware initialization and establish its asymptotic feasibility and KKT stationarity of cluster points under mild regularity conditions. Experiments on synthetic and real-world ecological datasets demonstrate improved recovery of latent factors and similarity/connectivity structure relative to existing baselines.

38.9OCApr 24
Global Convergence of Policy Gradient Methods for ReLU Controllers in Linear Quadratic Regulation

Jhojan A. Rodriguez-Gil, César A. Uribe

We study the convergence of model-based policy gradient for the deterministic, scalar, discounted linear-quadratic regulator when the controller is an overparameterized one-hidden-layer ReLU network without biases. Although the optimal LQR controller is linear, neural parameterization creates a redundant nonconvex weight space with a possibly asymmetric piecewise-linear controller. We show that this structure can still be analyzed exactly through the two effective gains induced on the positive and negative half-lines. Under suitable random initialization, sufficient width, and a small step size, the model-based policy gradient remains stable, decreases the cost geometrically, and drives the effective gains to the unique optimal scalar LQR gain with high probability.

AIFeb 25, 2024
PIDformer: Transformer Meets Control Theory

Tam Nguyen, César A. Uribe, Tan M. Nguyen et al.

In this work, we address two main shortcomings of transformer architectures: input corruption and rank collapse in their output representation. We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions, leading to lower-rank outputs and diminished representation capacity. Moreover, the steady-state solution of the model is sensitive to input perturbations. We incorporate a Proportional-Integral-Derivative (PID) closed-loop feedback control system with a reference point into the model to improve robustness and representation capacity. This integration aims to preserve high-frequency details while bolstering model stability, rendering it more noise-resilient. The resulting controlled state-space model is theoretically proven robust and adept at addressing the rank collapse. Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer), aimed at improving robustness and mitigating the rank-collapse issue inherent in softmax transformers. We empirically evaluate the model for advantages and robustness against baseline transformers across various practical tasks, including object classification, image segmentation, and language modeling.

OCMar 26, 2024
A Moreau Envelope Approach for LQR Meta-Policy Estimation

Ashwin Aravind, Mohammad Taha Toghani, César A. Uribe

We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first-order stationary point of the meta-LQR cost function. Numerical results show that the proposed approach outperforms naive averaging of controllers on new realizations of the linear system. We also provide empirical evidence that our method has better sample complexity than Model-Agnostic Meta-Learning (MAML) approaches.

OCMar 7, 2024
Decentralized and Equitable Optimal Transport

Ivan Lau, Shiqian Ma, César A. Uribe

This paper considers the decentralized (discrete) optimal transport (D-OT) problem. In this setting, a network of agents seeks to design a transportation plan jointly, where the cost function is the sum of privately held costs for each agent. We reformulate the D-OT problem as a constraint-coupled optimization problem and propose a single-loop decentralized algorithm with an iteration complexity of O(1/ε) that matches existing centralized first-order approaches. Moreover, we propose the decentralized equitable optimal transport (DE-OT) problem. In DE-OT, in addition to cooperatively designing a transportation plan that minimizes transportation costs, agents seek to ensure equity in their individual costs. The iteration complexity of the proposed method to solve DE-OT is also O(1/ε). This rate improves existing centralized algorithms, where the best iteration complexity obtained is O(1/ε^2).

LGNov 11, 2025
Gromov-Wasserstein Graph Coarsening

Carlos A. Taveras, Santiago Segarra, César A. Uribe

We study the problem of graph coarsening within the Gromov-Wasserstein geometry. Specifically, we propose two algorithms that leverage a novel representation of the distortion induced by merging pairs of nodes. The first method, termed Greedy Pair Coarsening (GPC), iteratively merges pairs of nodes that locally minimize a measure of distortion until the desired size is achieved. The second method, termed $k$-means Greedy Pair Coarsening (KGPC), leverages clustering based on pairwise distortion metrics to directly merge clusters of nodes. We provide conditions guaranteeing optimal coarsening for our methods and validate their performance on six large-scale datasets and a downstream clustering task. Results show that the proposed methods outperform existing approaches on a wide range of parameters and scenarios.

MLJun 18, 2024
An Optimal Transport Approach for Network Regression

Alex G. Zalles, Kai M. Hung, Ann E. Finneran et al.

We study the problem of network regression, where one is interested in how the topology of a network changes as a function of Euclidean covariates. We build upon recent developments in generalized regression models on metric spaces based on Fréchet means and propose a network regression method using the Wasserstein metric. We show that when representing graphs as multivariate Gaussian distributions, the network regression problem requires the computation of a Riemannian center of mass (i.e., Fréchet means). Fréchet means with non-negative weights translates into a barycenter problem and can be efficiently computed using fixed point iterations. Although the convergence guarantees of fixed-point iterations for the computation of Wasserstein affine averages remain an open problem, we provide evidence of convergence in a large number of synthetic and real-data scenarios. Extensive numerical results show that the proposed approach improves existing procedures by accurately accounting for graph size, topology, and sparsity in synthetic experiments. Additionally, real-world experiments using the proposed approach result in higher Coefficient of Determination ($R^{2}$) values and lower mean squared prediction error (MSPE), cementing improved prediction capabilities in practice.

LGMay 20, 2023
On First-Order Meta-Reinforcement Learning with Moreau Envelopes

Mohammad Taha Toghani, Sebastian Perez-Salazar, César A. Uribe

Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can quickly adapt to new environments and tasks. In this work, we study the MRL problem under the policy gradient formulation, where we propose a novel algorithm that uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of tasks by efficiently updating the policy parameters using a combination of gradient-based optimization and Moreau Envelope regularization. Moreau Envelopes provide a smooth approximation of the policy optimization problem, which enables us to apply standard optimization techniques and converge to an appropriate stationary point. We provide a detailed analysis of the MEMRL algorithm, where we show a sublinear convergence rate to a first-order stationary point for non-convex policy gradient optimization. We finally show the effectiveness of MEMRL on a multi-task 2D-navigation problem.

LGJan 30, 2022
Faster Convergence of Local SGD for Over-Parameterized Models

Tiancheng Qin, S. Rasoul Etesami, César A. Uribe

Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. For general convex loss functions, we establish an error bound of $Ø(1/T)$ under a mild data similarity assumption and an error bound of $Ø(K/T)$ otherwise, where $K$ is the number of local steps and $T$ is the total number of iterations. For non-convex loss functions we prove an error bound of $Ø(K/T)$. These bounds improve upon the best previous bound of $Ø(1/\sqrt{nT})$ in both cases, where $n$ is the number of nodes, when no assumption on the model being over-parameterized is made. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize. Finally, we validate our theoretical results by performing large-scale numerical experiments that reveal the convergence behavior of Local SGD for practical over-parameterized deep learning models, in which the $Ø(1/T)$ convergence rate of Local SGD is clearly shown.

OCSep 14, 2021
Scalable Average Consensus with Compressed Communications

Mohammad Taha Toghani, César A. Uribe

We propose a new decentralized average consensus algorithm with compressed communication that scales linearly with the network size n. We prove that the proposed method converges to the average of the initial values held locally by the agents of a network when agents are allowed to communicate with compressed messages. The proposed algorithm works for a broad class of compression operators (possibly biased), where agents interact over arbitrary static, undirected, and connected networks. We further present numerical experiments that confirm our theoretical results and illustrate the scalability and communication efficiency of our algorithm.

LGFeb 14, 2021
Communication-efficient Distributed Cooperative Learning with Compressed Beliefs

Mohammad Taha Toghani, César A. Uribe

We study the problem of distributed cooperative learning, where a group of agents seeks to agree on a set of hypotheses that best describes a sequence of private observations. In the scenario where the set of hypotheses is large, we propose a belief update rule where agents share compressed (either sparse or quantized) beliefs with an arbitrary positive compression rate. Our algorithm leverages a unified communication rule that enables agents to access wide-ranging compression operators as black-box modules. We prove the almost sure asymptotic exponential convergence of beliefs around the set of optimal hypotheses. Additionally, we show a non-asymptotic, explicit, and linear concentration rate in probability of the beliefs on the optimal hypothesis set. We provide numerical experiments to illustrate the communication benefits of our method. The simulation results show that the number of transmitted bits can be reduced to 5-10% of the non-compressed method in the studied scenarios.

LGNov 6, 2020
Communication-efficient Decentralized Local SGD over Undirected Networks

Tiancheng Qin, S. Rasoul Etesami, César A. Uribe

We consider the distributed learning problem where a network of $n$ agents seeks to minimize a global function $F$. Agents have access to $F$ through noisy gradients, and they can locally communicate with their neighbors a network. We study the Decentralized Local SDG method, where agents perform a number of local gradient steps and occasionally exchange information with their neighbors. Previous algorithmic analysis efforts have focused on the specific network topology (star topology) where a leader node aggregates all agents' information. We generalize that setting to an arbitrary network by analyzing the trade-off between the number of communication rounds and the computational effort of each agent. We bound the expected optimality gap in terms of the number of iterates $T$, the number of workers $n$, and the spectral gap of the underlying network. Our main results show that by using only $R=Ω(n)$ communication rounds, one can achieve an error that scales as $O({1}/{nT})$, where the number of communication rounds is independent of $T$ and only depends on the number of agents. Finally, we provide numerical evidence of our theoretical results through experiments on real and synthetic data.

OCOct 20, 2020
Robust Asynchronous and Network-Independent Cooperative Learning

Eduardo Mojica-Nava, David Yanguas-Rojas, César A. Uribe

We consider the model of cooperative learning via distributed non-Bayesian learning, where a network of agents tries to jointly agree on a hypothesis that best described a sequence of locally available observations. Building upon recently proposed weak communication network models, we propose a robust cooperative learning rule that allows asynchronous communications, message delays, unpredictable message losses, and directed communication among nodes. We show that our proposed learning dynamics guarantee that all agents in the network will have an asymptotic exponential decay of their beliefs on the wrong hypothesis, indicating that the beliefs of all agents will concentrate on the optimal hypotheses. Numerical experiments provide evidence on a number of network setups.

OCJul 7, 2020
A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks

César A. Uribe, Ali Jadbabaie

We propose a distributed, cubic-regularized Newton method for large-scale convex optimization over networks. The proposed method requires only local computations and communications and is suitable for federated learning applications over arbitrary network topologies. We show a $O(k^{{-}3})$ convergence rate when the cost function is convex with Lipschitz gradient and Hessian, with $k$ being the number of iterations. We further provide network-dependent bounds for the communication required in each step of the algorithm. We provide numerical experiments that validate our theoretical results.

OCNov 4, 2019
Generalized Self-concordant Hessian-barrier algorithms

Pavel Dvurechensky, Mathias Staudigl, César A. Uribe

Many problems in statistical learning, imaging, and computer vision involve the optimization of a non-convex objective function with singularities at the boundary of the feasible set. For such challenging instances, we develop a new interior-point technique building on the Hessian-barrier algorithm recently introduced in Bomze, Mertikopoulos, Schachinger and Staudigl, [SIAM J. Opt. 2019 29(3), pp. 2100-2127], where the Riemannian metric is induced by a generalized self-concordant function. This class of functions is sufficiently general to include most of the commonly used barrier functions in the literature of interior point methods. We prove global convergence to an approximate stationary point of the method, and in cases where the feasible set admits an easily computable self-concordant barrier, we verify worst-case optimal iteration complexity of the method. Applications in non-convex statistical estimation and $L^{p}$-minimization are discussed to given the efficiency of the method.

OCSep 3, 2018
A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks

César A. Uribe, Soomin Lee, Alexander Gasnikov et al.

We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem, which includes a graph that models the communication restrictions. We propose distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network. Initially, we focus on functions for which we can explicitly minimize its Legendre-Fenchel conjugate, i.e., admissible or dual friendly functions. Then, we study distributed optimization algorithms for non-dual friendly functions, as well as a method to improve the dependency on the parameters of the functions involved. Numerical analysis of the proposed algorithms is also provided.

OCMar 8, 2018
Distributed Computation of Wasserstein Barycenters over Networks

César A. Uribe, Darina Dvinskikh, Pavel Dvurechensky et al.

We propose a new \cu{class-optimal} algorithm for the distributed computation of Wasserstein Barycenters over networks. Assuming that each node in a graph has a probability distribution, we prove that every node can reach the barycenter of all distributions held in the network by using local interactions compliant with the topology of the graph. We provide an estimate for the minimum number of communication rounds required for the proposed method to achieve arbitrary relative precision both in the optimality of the solution and the consensus among all agents for undirected fixed networks.

OCDec 1, 2017
Optimal Algorithms for Distributed Optimization

César A. Uribe, Soomin Lee, Alexander Gasnikov et al.

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

OCApr 10, 2017
Distributed Learning for Cooperative Inference

Angelia Nedić, Alex Olshevsky, César A. Uribe

We study the problem of cooperative inference where a group of agents interact over a network and seek to estimate a joint parameter that best explains a set of observations. Agents do not know the network topology or the observations of other agents. We explore a variational interpretation of the Bayesian posterior density, and its relation to the stochastic mirror descent algorithm, to propose a new distributed learning algorithm. We show that, under appropriate assumptions, the beliefs generated by the proposed algorithm concentrate around the true parameter exponentially fast. We provide explicit non-asymptotic bounds for the convergence rate. Moreover, we develop explicit and computationally efficient algorithms for observation models belonging to exponential families.

OCDec 6, 2016
Distributed Gaussian Learning over Time-varying Directed Graphs

Angelia Nedić, Alex Olshevsky, César A. Uribe

We present a distributed (non-Bayesian) learning algorithm for the problem of parameter estimation with Gaussian noise. The algorithm is expressed as explicit updates on the parameters of the Gaussian beliefs (i.e. means and precision). We show a convergence rate of $O(1/k)$ with the constant term depending on the number of agents and the topology of the network. Moreover, we show almost sure convergence to the optimal solution of the estimation problem for the general case of time-varying directed graphs.

OCSep 23, 2016
A Tutorial on Distributed (Non-Bayesian) Learning: Problem, Algorithms and Results

Angelia Nedić, Alex Olshevsky, César A. Uribe

We overview some results on distributed learning with focus on a family of recently proposed algorithms known as non-Bayesian social learning. We consider different approaches to the distributed learning problem and its algorithmic solutions for the case of finitely many hypotheses. The original centralized problem is discussed at first, and then followed by a generalization to the distributed setting. The results on convergence and convergence rate are presented for both asymptotic and finite time regimes. Various extensions are discussed such as those dealing with directed time-varying networks, Nesterov's acceleration technique and a continuum sets of hypothesis.

OCSep 19, 2016
Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

Angelia Nedić, Alex Olshevsky, Wei Shi et al.

A recent algorithmic family for distributed optimization, DIGing's, have been shown to have geometric convergence over time-varying undirected/directed graphs. Nevertheless, an identical step-size for all agents is needed. In this paper, we study the convergence rates of the Adapt-Then-Combine (ATC) variation of the DIGing algorithm under uncoordinated step-sizes. We show that the ATC variation of DIGing algorithm converges geometrically fast even if the step-sizes are different among the agents. In addition, our analysis implies that the ATC structure can accelerate convergence compared to the distributed gradient descent (DGD) structure which has been used in the original DIGing algorithm.