18.8MLJun 4
Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. SamplesZiad Kobeissi, Éloïse Berthier
In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.
OCNov 4, 2018
On the implementation of a primal-dual algorithm for second order time-dependent mean field games with local couplingsLuis Briceño-Arias, Dante Kalise, Ziad Kobeissi et al.
We study a numerical approximation of a time-dependent Mean Field Game (MFG) system with local couplings. The discretization we consider stems from a variational approach described in [Briceno-Arias, Kalise, and Silva, SIAM J. Control Optim., 2017] for the stationary problem and leads to the finite difference scheme introduced by Achdou and Capuzzo-Dolcetta in [SIAM J. Numer. Anal., 48(3):1136-1162, 2010]. In order to solve the finite dimensional variational problems, in [Briceno-Arias, Kalise, and Silva, SIAM J. Control Optim., 2017] the authors implement the primal-dual algorithm introduced by Chambolle and Pock in [J. Math. Imaging Vision, 40(1):120-145, 2011], whose core consists in iteratively solving linear systems and applying a proximity operator. We apply that method to time-dependent MFG and, for large viscosity parameters, we improve the linear system solution by replacing the direct approach used in [Briceno-Arias, Kalise, and Silva, SIAM J. Control Optim., 2017] by suitable preconditioned iterative algorithms.
LGFeb 16, 2022
Temporal Difference Learning with Continuous Time and State in the Stochastic SettingZiad Kobeissi, Francis Bach
We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two original variants of the well-known TD(0) method using vanishing time steps. One is model-free and the other is model-based. For both methods, we prove theoretical convergence rates that we subsequently verify through numerical simulations. Alternatively, those methods can be interpreted as novel reinforcement learning approaches for approximating solutions of linear PDEs (partial differential equations) or linear BSDEs (backward stochastic differential equations).