LGSep 30, 2022
On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed BanditsAntoine Barrier, Aurélien Garivier, Gilles Stoltz
We lay the foundations of a non-parametric theory of best-arm identification in multi-armed bandits with a fixed budget T. We consider general, possibly non-parametric, models D for distributions over the arms; an overarching example is the model D = P(0,1) of all probability distributions over [0,1]. We propose upper bounds on the average log-probability of misidentifying the optimal arm based on information-theoretic quantities that correspond to infima over Kullback-Leibler divergences between some distributions in D and a given distribution. This is made possible by a refined analysis of the successive-rejects strategy of Audibert, Bubeck, and Munos (2010). We finally provide lower bounds on the same average log-probability, also in terms of the same new information-theoretic quantities; these lower bounds are larger when the (natural) assumptions on the considered strategies are stronger. All these new upper and lower bounds generalize existing bounds based, e.g., on gaps between distributions.
NCJul 15, 2024
MARVEL: MR Fingerprinting with Additional micRoVascular Estimates using bidirectional LSTMsAntoine Barrier, Thomas Coudert, Aurélien Delphin et al.
The Magnetic Resonance Fingerprinting (MRF) approach aims to estimate multiple MR or physiological parameters simultaneously with a single fast acquisition sequence. Most of the MRF studies proposed so far have used simple MR sequence types to measure relaxation times (T1, T2). In that case, deep learning algorithms have been successfully used to speed up the reconstruction process. In theory, the MRF concept could be used with a variety of other MR sequence types and should be able to provide more information about the tissue microstructures. Yet, increasing the complexity of the numerical models often leads to prohibited simulation times, and estimating multiple parameters from one sequence implies new dictionary dimensions whose sizes become too large for standard computers and DL architectures.In this paper, we propose to analyze the MRF signal coming from a complex balance Steady-state free precession (bSSFP) type sequence to simultaneously estimate relaxometry maps (T1, T2), Field maps (B1, B0) as well as microvascular properties such as the local Cerebral Blood Volume (CBV) or the averaged vessel Radius (R).To bypass the curse of dimensionality, we propose an efficient way to simulate the MR signal coming from numerical voxels containing realistic microvascular networks as well as a Bidirectional Long Short-Term Memory network used for the matching process.On top of standard MRF maps, our results on 3 human volunteers suggest that our approach can quickly produce high-quality quantitative maps of microvascular parameters that are otherwise obtained using longer dedicated sequences and intravenous injection of a contrast agent. This approach could be used for the management of multiple pathologies and could be tuned to provide other types of microstructural information.
LGOct 16, 2024
Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit ApproachHenrique Donâncio, Antoine Barrier, Leah F. South et al.
In deep Reinforcement Learning (RL), the learning rate critically influences both stability and performance, yet its optimal value shifts during training as the environment and policy evolve. Standard decay schedulers assume monotonic convergence and often misalign with these dynamics, leading to premature or delayed adjustments. We introduce LRRL, a meta-learning approach that dynamically selects the learning rate based on policy performance rather than training steps. LRRL adaptively favors rates that improve returns, remaining robust even when the candidate set includes values that individually cause divergence. Across Atari and MuJoCo benchmarks, LRRL achieves performance competitive with or superior to tuned baselines and standard schedulers. Our findings position LRRL as a practical solution for adapting to non-stationary objectives in deep RL.
STMay 27, 2021
A Non-asymptotic Approach to Best-Arm Identification for Gaussian BanditsAntoine Barrier, Aurélien Garivier, Tomáš Kocák
We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best of our knowledge the first strategy with non-asymptotic bounds that asymptotically matches the sample complexity.But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is biased towards exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.