LGOCMLNov 6, 2019

Improving reinforcement learning algorithms: towards optimal learning rate policies

arXiv:1911.02319v62 citations
Originality Highly original
AI Analysis

This work addresses the fundamental challenge of learning rate optimization in reinforcement learning, offering a novel approach with demonstrated performance gains in financial applications.

This paper tackles the problem of improving reinforcement learning algorithms by developing a dynamic optimal policy for learning rate selection in stochastic approximation, showing that classical convergence rates are pessimistic and can be improved to O((log(N)/N)^β). Empirically, their methodology significantly outperforms standard algorithms in applications like drift estimation, limit order placement, and optimal execution of shares.

This paper investigates to what extent one can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic and can be replaced by $O((\log(N)/N)^β)$ with $\frac{1}{2}\leq β\leq 1$ and $N$ the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate $(γ_k)_{k\geq 0}$ used in stochastic approximation (SA). We decompose our policy into two interacting levels: the inner and the outer level. In the inner level, we present the \nameref{Alg:v_4_s} algorithm (for "PAst Sign Search") which, based on a predefined sequence $(γ^o_k)_{k\geq 0}$, constructs a new sequence $(γ^i_k)_{k\geq 0}$ whose error decreases faster. In the outer level, we propose an optimal methodology for the selection of the predefined sequence $(γ^o_k)_{k\geq 0}$. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in reinforcement learning (RL) in the three following applications: the estimation of a drift, the optimal placement of limit orders and the optimal execution of large number of shares.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes