Concentration of Contractive Stochastic Approximation and Reinforcement Learning
This work provides theoretical guarantees for reinforcement learning algorithms, addressing stability and convergence issues in practical applications.
The paper derived concentration bounds for stochastic approximation algorithms with contractive maps under martingale difference and Markov noises, applying these to reinforcement learning algorithms like asynchronous Q-learning and TD(0).
Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).