QUANT-PHAILGMLSep 29, 2025

Bandits roaming Hilbert space

arXiv:2509.24569v11 citationsh-index: 3
Originality Highly original
AI Analysis

This work addresses the exploration-exploitation trade-off in quantum learning for researchers in quantum information and machine learning, offering incremental improvements with specific gains in efficiency and applications.

The paper tackles the problem of online learning of quantum state properties using multi-armed bandits, deriving optimal strategies with regret scaling as the square root of rounds and achieving polylogarithmic regret for pure states with continuous actions. It demonstrates applications in quantum state tomography, recommender systems, and thermodynamic work extraction, showing an exponential advantage in work dissipation over tomography-based protocols.

This thesis studies the exploration and exploitation trade-off in online learning of properties of quantum states using multi-armed bandits. Given streaming access to an unknown quantum state, in each round we select an observable from a set of actions to maximize its expectation value. Using past information, we refine actions to minimize regret; the cumulative gap between current reward and the maximum possible. We derive information-theoretic lower bounds and optimal strategies with matching upper bounds, showing regret typically scales as the square root of rounds. As an application, we reframe quantum state tomography to both learn the state efficiently and minimize measurement disturbance. For pure states and continuous actions, we achieve polylogarithmic regret using a sample-optimal algorithm based on a weighted online least squares estimator. The algorithm relies on the optimistic principle and controls the eigenvalues of the design matrix. We also apply our framework to quantum recommender systems and thermodynamic work extraction from unknown states. In this last setting, our results demonstrate an exponential advantage in work dissipation over tomography-based protocols.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes