SYMay 29, 2019
Nonexpansive Piecewise Constant Hybrid Systems are ConservativeArsalan Sharifnassab, John N. Tsitsiklis, S. Jamaloddin Golestani
Consider a partition of $R^n$ into finitely many polyhedral regions $D_i$ and associated drift vectors $μ_i\in R^n$. We study ``hybrid'' dynamical systems whose trajectories have a constant drift, $\dot x=μ_i$, whenever $x$ is in the interior of the $i$th region $D_i$, and behave consistently on the boundary between different regions. Our main result asserts that if such a system is nonexpansive (i.e., if the Euclidean distance between any pair of trajectories is a nonincreasing function of time), then the system must be conservative, i.e., its trajectories are the same as the trajectories of the negative subgradient flow associated with a potential function. Furthermore, this potential function is necessarily convex, and is linear on each of the regions $D_i$. We actually establish a more general version of this result, by making seemingly weaker assumptions on the dynamical system of interest.
SYMay 22, 2020
When do Trajectories have Bounded Sensitivity to Cumulative Perturbations?Arsalan Sharifnassab, S. Jamaloddin Golestani
We investigate sensitivity to cumulative perturbations for a few dynamical system classes of practical interest. A system is said to have bounded sensitivity to cumulative perturbations (bounded sensitivity, for short) if an additive disturbance leads to a change in the state trajectory that is bounded by a constant multiple of the size of the cumulative disturbance. As our main result, we show that there exist dynamical systems in the form of (negative) gradient field of a convex function that have unbounded sensitivity. We show that the result holds even when the convex potential function is piecewise linear. This resolves a question raised in [1], wherein it was shown that the (negative) (sub)gradient field of a piecewise linear and convex function has bounded sensitivity if the number of linear pieces is finite. Our results establish that the finiteness assumption is indeed necessary. Among our other results, we provide a necessary and sufficient condition for a linear dynamical system to have bounded sensitivity to cumulative perturbations. We also establish that the bounded sensitivity property is preserved, when a dynamical system with bounded sensitivity undergoes certain transformations. These transformations include convolution, time discretization, and spreading of a system (a transformation that captures approximate solutions of a system).
LGAug 19, 2021
Order Optimal Bounds for One-Shot Federated Learning over non-Convex Loss FunctionsArsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani
We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ sample functions from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^d\to\mathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine generates a signal of bounded length $B$ and sends it to a server. The server collects signals of all machines and outputs an estimate of the minimizer of $F$. We show that the expected loss of any algorithm is lower bounded by $\max\big(1/(\sqrt{n}(mB)^{1/d}), 1/\sqrt{mn}\big)$, up to a logarithmic factor. We then prove that this lower bound is order optimal in $m$ and $n$ by presenting a distributed learning algorithm, called Multi-Resolution Estimator for Non-Convex loss function (MRE-NC), whose expected loss matches the lower bound for large $mn$ up to polylogarithmic factors.
LGNov 2, 2019
Order Optimal One-Shot Distributed LearningArsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani
We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine then sends an $O(\log(mn))$-length message to a server, at which a parameter minimizing an expected loss is to be estimated. We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger than $\tilde{O}\big(m^{-{1}/{\max(d,2)}} n^{-1/2}\big)$, where $d$ is the dimension of the parameter space. This error bound meets existing lower bounds up to poly-logarithmic factors, and is thereby order optimal. The expected error of MRE, unlike existing algorithms, tends to zero as the number of machines ($m$) goes to infinity, even when the number of samples per machine ($n$) remains upper bounded by a constant. This property of the MRE algorithm makes it applicable in new machine learning paradigms where $m$ is much larger than $n$.
LGMay 12, 2019
One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve ThemSaber Salehkaleybar, Arsalan Sharifnassab, S. Jamaloddin Golestani
We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d. samples. Based on its observed samples, each machine sends a $B$-bit-long message to a server. The server then collects messages from all machines, and estimates a parameter that minimizes an expected convex loss function. We investigate the impact of communication constraint, $B$, on the expected error and derive a tight lower bound on the error achievable by any algorithm. We then propose an estimator, which we call Multi-Resolution Estimator (MRE), whose expected error (when $B\ge\log mn$) meets the aforementioned lower bound up to poly-logarithmic factors, and is thereby order optimal. We also address the problem of learning under tiny communication budget, and present lower and upper error bounds when $B$ is a constant. The expected error of MRE, unlike existing algorithms, tends to zero as the number of machines ($m$) goes to infinity, even when the number of samples per machine ($n$) remains upper bounded by a constant. This property of the MRE algorithm makes it applicable in new machine learning paradigms where $m$ is much larger than $n$.
DCMar 26, 2017
Distributed Voting/Ranking with Optimal Number of States per NodeSaber Salehkaleybar, Arsalan Sharif-Nassab, S. Jamaloddin Golestani
Considering a network with $n$ nodes, where each node initially votes for one (or more) choices out of $K$ possible choices, we present a Distributed Multi-choice Voting/Ranking (DMVR) algorithm to determine either the choice with maximum vote (the voting problem) or to rank all the choices in terms of their acquired votes (the ranking problem). The algorithm consolidates node votes across the network by updating the states of interacting nodes using two key operations, the union and the intersection. The proposed algorithm is simple, independent from network size, and easily scalable in terms of the number of choices $K$, using only $K\times 2^{K-1}$ nodal states for voting, and $K\times K!$ nodal states for ranking. We prove the number of states to be optimal in the ranking case, this optimality is conjectured to also apply to the voting case. The time complexity of the algorithm is analyzed in complete graphs. We show that the time complexity for both ranking and voting is $O(\log(n))$ for given vote percentages, and is inversely proportional to the minimum of the vote percentage differences among various choices.
DCMar 26, 2017
Token-based Function Computation with MemorySaber Salehkaleybar, S. Jamaloddin Golestani
In distributed function computation, each node has an initial value and the goal is to compute a function of these values in a distributed manner. In this paper, we propose a novel token-based approach to compute a wide class of target functions to which we refer as "Token-based function Computation with Memory" (TCM) algorithm. In this approach, node values are attached to tokens and travel across the network. Each pair of travelling tokens would coalesce when they meet, forming a token with a new value as a function of the original token values. In contrast to the Coalescing Random Walk (CRW) algorithm, where token movement is governed by random walk, meeting of tokens in our scheme is accelerated by adopting a novel chasing mechanism. We proved that, compared to the CRW algorithm, the TCM algorithm results in a reduction of time complexity by a factor of at least $\sqrt{n/\log(n)}$ in Erdös-Renyi and complete graphs, and by a factor of $\log(n)/\log(\log(n))$ in torus networks. Simulation results show that there is at least a constant factor improvement in the message complexity of TCM algorithm in all considered topologies. Robustness of the CRW and TCM algorithms in the presence of node failure is analyzed. We show that their robustness can be improved by running multiple instances of the algorithms in parallel.