Josep Lumbreras

QUANT-PH
h-index10
8papers
42citations
Novelty62%
AI Score48

8 Papers

QUANT-PHJan 31, 2023
Quantum contextual bandits and recommender systems for quantum data

Shrigyan Brahmachari, Josep Lumbreras, Marco Tomamichel

We study a recommender system for quantum data using the linear contextual bandit framework. In each round, a learner receives an observable (the context) and has to recommend from a finite set of unknown quantum states (the actions) which one to measure. The learner has the goal of maximizing the reward in each round, that is the outcome of the measurement on the unknown state. Using this model we formulate the low energy quantum state recommendation problem where the context is a Hamiltonian and the goal is to recommend the state with the lowest energy. For this task, we study two families of contexts: the Ising model and a generalized cluster model. We observe that if we interpret the actions as different phases of the models then the recommendation is done by classifying the correct phase of the given Hamiltonian and the strategy can be interpreted as an online quantum phase classifier.

96.2QUANT-PHMar 26
Reinforcement learning for quantum processes with memory

Josep Lumbreras, Ruo Cheng Huang, Yanglin Hu et al.

In reinforcement learning, an agent interacts sequentially with an environment to maximize a reward, receiving only partial, probabilistic feedback. This creates a fundamental exploration-exploitation trade-off: the agent must explore to learn the hidden dynamics while exploiting this knowledge to maximize its target objective. While extensively studied classically, applying this framework to quantum systems requires dealing with hidden quantum states that evolve via unknown dynamics. We formalize this problem via a framework where the environment maintains a hidden quantum memory evolving via unknown quantum channels, and the agent intervenes sequentially using quantum instruments. For this setting, we adapt an optimistic maximum-likelihood estimation algorithm. We extend the analysis to continuous action spaces, allowing us to model general positive operator-valued measures (POVMs). By controlling the propagation of estimation errors through quantum channels and instruments, we prove that the cumulative regret of our strategy scales as $\widetilde{\mathcal{O}}(\sqrt{K})$ over $K$ episodes. Furthermore, via a reduction to the multi-armed quantum bandit problem, we establish information-theoretic lower bounds demonstrating that this sublinear scaling is strictly optimal up to polylogarithmic factors. As a physical application, we consider state-agnostic work extraction. When extracting free energy from a sequence of non-i.i.d. quantum states correlated by a hidden memory, any lack of knowledge about the source leads to thermodynamic dissipation. In our setting, the mathematical regret exactly quantifies this cumulative dissipation. Using our adaptive algorithm, the agent uses past energy outcomes to improve its extraction protocol on the fly, achieving sublinear cumulative dissipation, and, consequently, an asymptotically zero dissipation rate.

51.0QUANT-PHMay 9
Learning Pure Quantum States in Any Dimension (Almost) Without Regret

Josep Lumbreras, Marco Tomamichel

We extend quantum state tomography with minimal cumulative disturbance, first investigated in [arXiv:2406.18370], to arbitrary finite-dimensional pure states. A learner sequentially receives fresh copies of an unknown pure state, chooses a rank-one projector for each copy using the previous outcomes, and performs the corresponding two-outcome projective measurement. The goal is to learn the state while keeping the chosen projectors close to the unknown state in order to minimize disturbance. The qubit solution relies on the special geometry of the Bloch sphere and does not extend directly to qudits, where pure states form a curved manifold. We show that this obstruction can be overcome by working locally on the pure-state manifold. The algorithm proceeds in epochs. In each epoch, it fixes a current estimate, measures pairs of nearby rank-one projectors obtained by moving in opposite tangent directions, and takes differences of the corresponding outcomes. This gives an exact linear observation of the tangent component of the error. The resulting local linear models are combined with a robust variance-adaptive estimator and a hot-start regularization that transfers precision across epochs. For every unknown pure state in dimension \(d\), after \(T\) measured copies, our protocol achieves cumulative regret \(\mathcal{O}(d^3\log^2 T)\), and at each intermediate time \(t\leq T\) its current estimate has online infidelity \(\mathcal{O}(d^3\log(T)/t)\). Hence, pure-state tomography with essentially no cumulative disturbance is not a peculiarity of qubits but a geometric phenomenon that persists for qudits.

QUANT-PHDec 12, 2023
Learning finitely correlated states: stability of the spectral reconstruction

Marco Fanizza, Niklas Galke, Josep Lumbreras et al.

Matrix product operators allow efficient descriptions (or realizations) of states on a 1D lattice. We consider the task of learning a realization of minimal dimension from copies of an unknown state, such that the resulting operator is close to the density matrix in trace norm. For finitely correlated translation-invariant states on an infinite chain, a realization of minimal dimension can be exactly reconstructed via linear algebra operations from the marginals of a size depending on the representation dimension. We establish a bound on the trace norm error for an algorithm that estimates a candidate realization from estimates of these marginals and outputs a matrix product operator, estimating the state of a chain of arbitrary length $t$. This bound allows us to establish an $O(t^2)$ upper bound on the sample complexity of the learning task, with an explicit dependence on the site dimension, realization dimension and spectral properties of a certain map constructed from the state. A refined error bound can be proven for $C^*$-finitely correlated states, which have an operational interpretation in terms of sequential quantum channels applied to the memory system. We can also obtain an analogous error bound for a class of matrix product density operators on a finite chain reconstructible by local marginals. In this case, a linear number of marginals must be estimated, obtaining a sample complexity of $\tilde{O}(t^3)$. The learning algorithm also works for states that are sufficiently close to a finitely correlated state, with the potential of providing competitive algorithms for other interesting families of states.

LGFeb 19, 2024
Linear bandits with polylogarithmic minimax regret

Josep Lumbreras, Marco Tomamichel

We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $λ_{\min} ( V_t ) = Ω(\sqrt{λ_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.

QUANT-PHMay 14, 2025
Quantum state-agnostic work extraction (almost) without dissipation

Josep Lumbreras, Ruo Cheng Huang, Yanglin Hu et al.

We investigate work extraction protocols designed to transfer the maximum possible energy to a battery using sequential access to $N$ copies of an unknown pure qubit state. The core challenge is designing interactions to optimally balance two competing goals: charging of the battery optimally using the qubit in hand, and acquiring more information by qubit to improve energy harvesting in subsequent rounds. Here, we leverage exploration-exploitation trade-off in reinforcement learning to develop adaptive strategies achieving energy dissipation that scales only poly-logarithmically in $N$. This represents an exponential improvement over current protocols based on full state tomography.

QUANT-PHSep 29, 2025
Bandits roaming Hilbert space

Josep Lumbreras

This thesis studies the exploration and exploitation trade-off in online learning of properties of quantum states using multi-armed bandits. Given streaming access to an unknown quantum state, in each round we select an observable from a set of actions to maximize its expectation value. Using past information, we refine actions to minimize regret; the cumulative gap between current reward and the maximum possible. We derive information-theoretic lower bounds and optimal strategies with matching upper bounds, showing regret typically scales as the square root of rounds. As an application, we reframe quantum state tomography to both learn the state efficiently and minimize measurement disturbance. For pure states and continuous actions, we achieve polylogarithmic regret using a sample-optimal algorithm based on a weighted online least squares estimator. The algorithm relies on the optimistic principle and controls the eigenvalues of the design matrix. We also apply our framework to quantum recommender systems and thermodynamic work extraction from unknown states. In this last setting, our results demonstrate an exponential advantage in work dissipation over tomography-based protocols.

QUANT-PHJun 26, 2024
Learning pure quantum states (almost) without regret

Josep Lumbreras, Mikhail Terekhov, Marco Tomamichel

We initiate the study of sample-optimal quantum state tomography with minimal disturbance to the samples. Can we efficiently learn a precise description of a quantum state through sequential measurements of samples while at the same time making sure that the post-measurement state of the samples is only minimally perturbed? Defining regret as the cumulative disturbance of all samples, the challenge is to find a balance between the most informative sequence of measurements on the one hand and measurements incurring minimal regret on the other. Here we answer this question for qubit states by exhibiting a protocol that for pure states achieves maximal precision while incurring a regret that grows only polylogarithmically with the number of samples, a scaling that we show to be optimal.