QUANT-PHLGJul 21, 2025

Automated Design of Structured Variational Quantum Circuits with Reinforcement Learning

arXiv:2507.16001v1
Originality Highly original
AI Analysis

This work addresses the problem of improving variational quantum algorithms for near-term quantum hardware, offering an incremental advance by automating circuit design with reinforcement learning.

The authors tackled the challenge of designing variational quantum circuits for combinatorial optimization by framing circuit synthesis as a sequential decision-making problem and introducing two reinforcement learning methods, RLVQC Block and RLVQC Global. Their results show that RLVQC Block consistently outperforms the Quantum Approximate Optimization Algorithm (QAOA) and generally surpasses RLVQC Global, with RLVQC Global finding significantly shorter circuits.

Variational Quantum Algorithms (VQAs) are among the most promising approaches for leveraging near-term quantum hardware, yet their effectiveness strongly depends on the design of the underlying circuit ansatz, which is typically constructed with heuristic methods. In this work, we represent the synthesis of variational quantum circuits as a sequential decision-making problem, where gates are added iteratively in order to optimize an objective function, and we introduce two reinforcement learning-based methods, RLVQC Global and RLVQC Block, tailored to combinatorial optimization problems. RLVQC Block creates ansatzes that generalize the Quantum Approximate Optimization Algorithm (QAOA), by discovering a two-qubits block that is applied to all the interacting qubit pairs. While RLVQC Global further generalizes the ansatz and adds gates unconstrained by the structure of the interacting qubits. Both methods adopt the Proximal Policy Optimization (PPO) algorithm and use empirical measurement outcomes as state observations to guide the agent. We evaluate the proposed methods on a broad set of QUBO instances derived from classical graph-based optimization problems. Our results show that both RLVQC methods exhibit strong results with RLVQC Block consistently outperforming QAOA and generally surpassing RLVQC Global. While RLVQC Block produces circuits with depth comparable to QAOA, the Global variant is instead able to find significantly shorter ones. These findings suggest that reinforcement learning methods can be an effective tool to discover new ansatz structures tailored for specific problems and that the most effective circuit design strategy lies between rigid predefined architectures and completely unconstrained ones, offering a favourable trade-off between structure and adaptability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes