ML LG OCFeb 2, 2019

Non-asymptotic Analysis of Biased Stochastic Approximation Scheme

Belhal Karimi, Blazej Miasojedow, Eric Moulines, Hoi-To Wai

arXiv:1902.00629v424.1105 citationsh-index: 59

Originality Incremental advance

AI Analysis

This work addresses limitations in stochastic approximation analysis for researchers in machine learning, particularly for online and reinforcement learning tasks, though it appears incremental by extending prior analyses.

The paper tackles the problem of analyzing stochastic approximation schemes under restrictive assumptions like unbiased gradients and convex objectives, and relaxes these to handle non-convex, smooth functions with state-dependent Markov chains and biased updates, applying it to online EM and policy-gradient methods.

Stochastic approximation (SA) is a key method used in statistical learning. Recently, its non-asymptotic convergence analysis has been considered in many papers. However, most of the prior analyses are made under restrictive assumptions such as unbiased gradient estimates and convex objective function, which significantly limit their applications to sophisticated tasks such as online and reinforcement learning. These restrictions are all essentially relaxed in this work. In particular, we analyze a general SA scheme to minimize a non-convex, smooth objective function. We consider update procedure whose drift term depends on a state-dependent Markov chain and the mean field is not necessarily of gradient type, covering approximate second-order method and allowing asymptotic bias for the one-step updates. We illustrate these settings with the online EM algorithm and the policy-gradient method for average reward maximization in reinforcement learning.

View on arXiv PDF

Similar