Manojlo Vukovic

OC
h-index22
3papers
Novelty58%
AI Score40

3 Papers

29.0ITMar 21
Tackling heavy-tailed noise in distributed estimation: Asymptotic performance and tradeoffs

Dragana Bajovic, Dusan Jakovetic, Soummya Kar et al.

We present an algorithm for distributed estimation of an unknown vector parameter $\boldsymbolθ^\ast \in {\mathbb R}^M$ in the presence of heavy-tailed observation and communication noises. Heavy-tailed noises frequently appear, e.g., in densely deployed Internet of Things (IoT) or wireless sensor network systems. The presented algorithm falls within the class of \emph{consensus+innovation} estimators and combats the effect of the heavy-tailed noises by adding general nonlinearities in the consensus and innovations update parts. We present results on almost sure convergence and asymptotic normality of the estimator. In addition, we provide novel analytical studies that reveal interesting tradeoffs between the system noises and the underlying network topology.

60.5OCMay 8
Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling

Manojlo Vukovic, Dusan Jakovetic

We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the $K$-sized data batch into $M$ distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the stochastic gradient direction of the chunk that is medoid of gradients of all data-chunks. Under a general class of symmetric heavy-tailed gradient noises and a standard non-convex setting, we establish explicit bounds on the expected time-averaged squared gradient norm. More precisely, we show that the latter quantity converges at rate $\mathcal{O}(T^{-1})$ to a small neighborhood of zero; we explicitly characterize this neighborhood in terms of noise and algorithm's parameters. Moreover, if the time horizon is known in advance, we establish the rate of $\mathcal{O}(T^{-\frac{1}{2}}).$ Furthermore, when clipping is incorporated, we obtain convergence guaranties in the high-probability sense and recover the same rate. Experimental results indicate that R-SGD-Mini and its clipped variant consistently perform favorably compared to SGD, clipped SGD and Median-of-Means based methods.

OCMay 30, 2025
Distributed gradient methods under heavy-tailed communication noise

Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic et al.

We consider a standard distributed optimization problem in which networked nodes collaboratively minimize the sum of their locally known convex costs. For this setting, we address for the first time the fundamental problem of design and analysis of distributed methods to solve the above problem when inter-node communication is subject to \emph{heavy-tailed} noise. Heavy-tailed noise is highly relevant and frequently arises in densely deployed wireless sensor and Internet of Things (IoT) networks. Specifically, we design a distributed gradient-type method that features a carefully balanced mixed time-scale time-varying consensus and gradient contribution step sizes and a bounded nonlinear operator on the consensus update to limit the effect of heavy-tailed noise. Assuming heterogeneous strongly convex local costs with mutually different minimizers that are arbitrarily far apart, we show that the proposed method converges to a neighborhood of the network-wide problem solution in the mean squared error (MSE) sense, and we also characterize the corresponding convergence rate. We further show that the asymptotic MSE can be made arbitrarily small through consensus step-size tuning, possibly at the cost of slowing down the transient error decay. Numerical experiments corroborate our findings and demonstrate the resilience of the proposed method to heavy-tailed (and infinite variance) communication noise. They also show that existing distributed methods, designed for finite-communication-noise-variance settings, fail in the presence of infinite variance noise.