QUANT-PHLGDec 15, 2021

Quantum Algorithms for Reinforcement Learning with a Generative Model

arXiv:2112.08451v138 citations
Originality Highly original
AI Analysis

This work addresses the sample efficiency challenge in reinforcement learning for researchers and practitioners, offering a novel quantum approach with proven speedups, though it is incremental in applying quantum computing to a known bottleneck.

The paper tackles the problem of reinforcement learning in Markov decision processes by designing quantum algorithms that approximate optimal policies, value functions, and Q-functions, achieving quadratic speedups over classical methods in terms of approximation accuracy, effective time horizon, and action space size, with a proven optimal quantum lower bound for computing the Q-function.

Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a $γ$-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy ($π^*$), the optimal value function ($v^*$), and the optimal $Q$-function ($q^*$), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy ($ε$) and two main parameters of the MDP: the effective time horizon ($\frac{1}{1-γ}$) and the size of the action space ($A$). Moreover, we show that our quantum algorithm for computing $q^*$ is optimal by proving a matching quantum lower bound.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes