Model-based Offline Quantum Reinforcement Learning
This is an incremental step toward quantum advantage in reinforcement learning, potentially benefiting researchers in quantum computing and AI if scalable quantum hardware becomes available.
The paper tackles the problem of offline reinforcement learning using quantum computing by introducing the first model-based algorithm that implements both the model and policy as variational quantum circuits, demonstrating functionality on the cart-pole benchmark with gradient-based model training and gradient-free policy optimization.
This paper presents the first algorithm for model-based offline quantum reinforcement learning and demonstrates its functionality on the cart-pole benchmark. The model and the policy to be optimized are each implemented as variational quantum circuits. The model is trained by gradient descent to fit a pre-recorded data set. The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function. This model-based approach allows, in principle, full realization on a quantum computer during the optimization phase and gives hope that a quantum advantage can be achieved as soon as sufficiently powerful quantum computers are available.