BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading
This work addresses data efficiency for batch RL practitioners, but it is incremental as it adapts existing quantum and classical techniques without a major breakthrough.
The paper tackles the challenge of data inefficiency in batch reinforcement learning by proposing a quantum-enhanced algorithm using variational quantum circuits as function approximators, achieving comparable performance to classical methods on the CartPole environment.
Deep reinforcement learning (DRL) often requires a large number of data and environment interactions, making the training process time-consuming. This challenge is further exacerbated in the case of batch RL, where the agent is trained solely on a pre-collected dataset without environment interactions. Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. In this paper, we investigate this potential advantage by proposing a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm. Additionally, we introduce a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.