Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences
This work addresses the challenge of real-time optimization of pulse sequences for qubits in quantum computing, which is incremental as it applies RL to a known bottleneck in noise mitigation.
The paper tackled the problem of mitigating phase decoherence in qubits by designing optimal dynamical decoupling pulse sequences without explicit noise knowledge, using a reinforcement learning method with a group-theoretic action set, and demonstrated that the RL agent could learn sequences to minimize dephasing effectively.
Dynamical decoupling seeks to mitigate phase decoherence in qubits by applying a carefully designed sequence of effectively instantaneous electromagnetic pulses. Although analytic solutions exist for pulse timings that are optimal under specific noise regimes, identifying the optimal timings for a realistic noise spectrum remains challenging. We propose a reinforcement learning (RL)-based method for designing pulse sequences on qubits. Our novel action set enables the RL agent to efficiently navigate this inherently non-convex optimization landscape. The action set, derived from Thompson's group $F$, is applicable to a broad class of sequential decision problems whose states can be represented as bounded sequences. We demonstrate that our RL agent can learn pulse sequences that minimize dephasing without requiring explicit knowledge of the underlying noise spectrum. This work opens the possibility for real-time learning of optimal dynamical decoupling sequences on qubits which are dephasing-limited. The model-free nature of our algorithm suggests that the agent may ultimately learn optimal pulse sequences even in the presence of unmodeled physical effects, such as pulse errors or non-Gaussian noise.