Differentiable Quantum Architecture Search in Asynchronous Quantum Reinforcement Learning
This work addresses the problem of requiring significant quantum expertise to build effective QRL models, making it more accessible for researchers and practitioners in quantum machine learning, though it appears incremental as it builds on existing QRL and architecture search methods.
The paper tackles the challenge of designing quantum circuit architectures for quantum reinforcement learning (QRL) by proposing differentiable quantum architecture search (DiffQAS), which automates this process using gradient-based optimization and asynchronous RL for parallel training. The result shows that DiffQAS-QRL achieves performance comparable to manually-crafted architectures across various environments, demonstrating stability and robustness.
The emergence of quantum reinforcement learning (QRL) is propelled by advancements in quantum computing (QC) and machine learning (ML), particularly through quantum neural networks (QNN) built on variational quantum circuits (VQC). These advancements have proven successful in addressing sequential decision-making tasks. However, constructing effective QRL models demands significant expertise due to challenges in designing quantum circuit architectures, including data encoding and parameterized circuits, which profoundly influence model performance. In this paper, we propose addressing this challenge with differentiable quantum architecture search (DiffQAS), enabling trainable circuit parameters and structure weights using gradient-based optimization. Furthermore, we enhance training efficiency through asynchronous reinforcement learning (RL) methods facilitating parallel training. Through numerical simulations, we demonstrate that our proposed DiffQAS-QRL approach achieves performance comparable to manually-crafted circuit architectures across considered environments, showcasing stability across diverse scenarios. This methodology offers a pathway for designing QRL models without extensive quantum knowledge, ensuring robust performance and fostering broader application of QRL.