QUANT-PHLGJan 21, 2024

VQC-Based Reinforcement Learning with Data Re-uploading: Performance and Trainability

arXiv:2401.11555v218 citationsQuantum Machine Intelligence
Originality Incremental advance
AI Analysis

This addresses the problem of trainability in quantum reinforcement learning for researchers, but it is incremental as it builds on existing VQC and Deep Q-Learning methods.

This work empirically studies the performance and trainability of Variational Quantum Circuit (VQC)-based Deep Q-Learning models in classic control benchmarks, showing that gradients remain substantial and increasing qubits does not lead to exponential vanishing, unlike expected from the Barren Plateau Phenomenon.

Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the gradients of these models remain substantial throughout training due to the moving targets of Deep Q-Learning. Moreover, we empirically show that increasing the number of qubits does not lead to an exponential vanishing behavior of the magnitude and variance of the gradients for a PQC approximating a 2-design, unlike what was expected due to the Barren Plateau Phenomenon. This hints at the possibility of VQCs being specially adequate for being used as function approximators in such a context.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes