SYFeb 2, 2018
Distributed Time Synchronization for Networks with Random Delays and Measurement NoiseMilos S. Stankovic, Srdjan S. Stankovic, Karl Henrik Johansson
In this paper a new distributed asynchronous algorithm is proposed for time synchronization in networks with random communication delays, measurement noise and communication dropouts. Three different types of the drift correction algorithm are introduced, based on different kinds of local time increments. Under nonrestrictive conditions concerning network properties, it is proved that all the algorithm types provide convergence in the mean square sense and with probability one (w.p.1) of the corrected drifts of all the nodes to the same value (consensus). An estimate of the convergence rate of these algorithms is derived. For offset correction, a new algorithm is proposed containing a compensation parameter coping with the influence of random delays and special terms taking care of the influence of both linearly increasing time and drift correction. It is proved that the corrected offsets of all the nodes converge in the mean square sense and w.p.1. An efficient offset correction algorithm based on consensus on local compensation parameters is also proposed. It is shown that the overall time synchronization algorithm can also be implemented as a flooding algorithm with one reference node. It is proved that it is possible to achieve bounded error between local corrected clocks in the mean square sense and w.p.1. Simulation results provide an additional practical insight into the algorithm properties and show its advantage over the existing methods.
LGJun 18, 2020
Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement LearningMilos S. Stankovic, Marko Beko, Srdjan S. Stankovic
In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information structure constraints, limiting inter-agent communications to small neighborhoods. The algorithms are composed of: 1) local parameter updates based on single-agent off-policy gradient temporal difference learning algorithms, including eligibility traces with state dependent parameters, and 2) linear stochastic time varying consensus schemes, represented by directed graphs. The proposed algorithms differ by their form, definition of eligibility traces, selection of time scales and the way of incorporating consensus iterations. The main contribution of the paper is a convergence analysis based on the general properties of the underlying Feller-Markov processes and the stochastic time varying consensus model. We prove, under general assumptions, that the parameter estimates generated by all the proposed algorithms weakly converge to the corresponding ordinary differential equations (ODE) with precisely defined invariant sets. It is demonstrated how the adopted methodology can be applied to temporal-difference algorithms under weaker information structure constraints. The variance reduction effect of the proposed algorithms is demonstrated by formulating and analyzing an asymptotic stochastic differential equation. Specific guidelines for communication network design are provided. The algorithms' superior properties are illustrated by characteristic simulation results.