Quantum Policy Gradient in Reproducing Kernel Hilbert Space
This work addresses a gap in quantum RL for researchers, though it appears incremental as it extends existing kernel methods from supervised learning to RL.
The paper tackles the problem of applying kernel methods to quantum reinforcement learning by proposing quantum policy gradient algorithms with kernel policies, achieving a quadratic reduction in query complexity compared to classical counterparts.
Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum RL. This paper proposes the use of kernel policies and quantum policy gradient algorithms for quantum-accessible environments. After discussing the properties of such policies and a demonstration of classical policy gradient on a coherent policy in a quantum environment, we propose parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including data-driven forms for functions (and their gradients) as well as tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. We propose actor-critic algorithms based on stochastic policy gradient, deterministic policy gradient, and natural policy gradient, and demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.