QUANT-PHLGApr 26, 2023

Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement Learning

arXiv:2304.13571v214 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses sample inefficiency in reinforcement learning for AI applications, but it is incremental as it builds on existing quantum and gradient-based methods.

The paper tackles the high cost of reinforcement learning by proposing the quantum natural policy gradient (QNPG) algorithm, which uses variational quantum circuits and an efficient quantum Fisher information approximation to improve sample efficiency, demonstrating faster convergence and stability in Contextual Bandits environments and feasibility on a 12-qubit hardware device.

Reinforcement learning is a growing field in AI with a lot of potential. Intelligent behavior is learned automatically through trial and error in interaction with the environment. However, this learning process is often costly. Using variational quantum circuits as function approximators potentially can reduce this cost. In order to implement this, we propose the quantum natural policy gradient (QNPG) algorithm -- a second-order gradient-based routine that takes advantage of an efficient approximation of the quantum Fisher information matrix. We experimentally demonstrate that QNPG outperforms first-order based training on Contextual Bandits environments regarding convergence speed and stability and moreover reduces the sample complexity. Furthermore, we provide evidence for the practical feasibility of our approach by training on a 12-qubit hardware device.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes