LGApr 22
Cover meets Robbins while Betting on Bounded Data: $\ln n$ Regret and Almost Sure $\ln\ln n$ RegretShubhada Agrawal, Aaditya Ramdas
Consider betting against a sequence of data in $[0,1]$, where one is allowed to make any bet that is fair if the data have a conditional mean $m_0 \in (0,1)$. Cover's universal portfolio algorithm delivers a worst-case regret of $O(\ln n)$ compared to the best constant bet in hindsight, and this bound is unimprovable against adversarially generated data. In this work, we present a novel mixture betting strategy that combines insights from Robbins and Cover, and exhibits a different behavior: it eventually produces a regret of $O(\ln \ln n)$ on \emph{almost} all paths (a measure-one set of paths if each conditional mean equals $m_0$ and intrinsic variance increases to $\infty$), but has an $O(\log n)$ regret on the complement (a measure zero set of paths). Our paper appears to be the first to point out the value in hedging two very different strategies to achieve a best-of-both-worlds adaptivity to stochastic data and protection against adversarial data. We contrast our results to those in~\cite{agrawal2025regret} for a sub-Gaussian mixture on unbounded data: their worst-case regret has to be unbounded, but a similar hedging delivers both an optimal betting growth-rate and an almost sure $\ln\ln n$ regret on stochastic data. Finally, our strategy witnesses a sharp game-theoretic upper law of the iterated logarithm, analogous to~\cite{shafer2005probability}.
LGJun 15, 2023
Optimal Best-Arm Identification in Bandits with Access to Offline DataShubhada Agrawal, Sandeep Juneja, Karthikeyan Shanmugam et al.
Learning paradigms based purely on offline data as well as those based solely on sequential online learning have been well-studied in the literature. In this paper, we consider combining offline data with online learning, an area less studied but of obvious practical importance. We consider the stochastic $K$-armed bandit problem, where our goal is to identify the arm with the highest mean in the presence of relevant offline data, with confidence $1-δ$. We conduct a lower bound analysis on policies that provide such $1-δ$ probabilistic correctness guarantees. We develop algorithms that match the lower bound on sample complexity when $δ$ is small. Our algorithms are computationally efficient with an average per-sample acquisition cost of $\tilde{O}(K)$, and rely on a careful characterization of the optimality conditions of the lower bound problem.
MLSep 28, 2023
CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic CorruptionShubhada Agrawal, Timothée Mathieu, Debabrota Basu et al.
We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
PRMay 20
Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian NoiseShubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia
We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.
ITApr 16
Regret Tail Characterization of Optimal Bandit Algorithms with Generic RewardsSubhodip Panda, Shubhada Agrawal
We study the tail behavior of regret in stochastic multi-armed bandits for algorithms that are asymptotically optimal in expectation. While minimizing expected regret is the classical objective, recent work shows that even such algorithms can exhibit heavy regret tails, incurring large regret with non-negligible probability. Existing sharp characterizations of regret tails are largely restricted to parametric settings, such as single-parameter exponential families. In this work, we extend the $\KLinf$-UCB algorithm of to a broad nonparametric class of reward distributions satisfying mild assumptions, and establish its asymptotic optimality in expectation. We then analyze the tail behavior of its regret and derive a novel upper bound on the regret tail probability. As special cases, our results recover regret-tail guarantees for both bounded-support and heavy-tailed (moment-bounded) bandit models. Moreover, for the special case of finitely-supported reward distributions, our upper bound matches the known lower bound exactly. Our results thus provide a unified and tight characterization of regret tails for asymptotically optimal KL-based UCB algorithms, going beyond parametric models.
ITMar 22
Dual Representation of Minimum Divergence Under Integral ConstraintsShubhanshu Shekhar, Shubhada Agrawal
Minimum divergence problems under integral constraints appear throughout statistics and probability, including sequential inference, bandit theory, and distributionally robust optimization. In many such settings, dual representations are the key step that convert information-theoretic lower bounds into computationally tractable (and often near-optimal) algorithms. In this paper, we present a general two-stage recipe for deriving dual representations of constrained minimum divergence (in the second argument) for distributions supported on $[0,1]^K$. The first stage derives a dual representation for finitely-supported distributions using classical finite-dimensional convex duality techniques, while the second establishes an abstract interchange argument that lifts this discretized dual to arbitrary distributions. We begin with the simplest case of mean-constrained minimum relative entropy, commonly called $\mathrm{KL}_{\inf}$, and generalize an existing argument from multi-armed bandits literature for $K=1$ to arbitrary dimensions. Our main contribution is to significantly expand the scope of this approach to a broad class of $f$-divergences (beyond relative entropy) and to general integral constraint functionals (beyond the mean constraint). Finally, we illustrate the statistical implications of our results by constructing optimal procedures for sequential testing, estimation, and change detection with observations in $[0,1]^K$.
STFeb 19
Asymptotically Optimal Sequential Testing with Markovian DataAlhad Sethi, Kavali Sofia Sagar, Shubhada Agrawal et al.
We study one-sided and $α$-correct sequential hypothesis testing for data generated by an ergodic Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set $P$ of stochastic matrices, and the alternative corresponds to a disjoint set $Q$. We establish a tight non-asymptotic instance-dependent lower bound on the expected stopping time of any valid sequential test under the alternative. Our novel analysis improves the existing lower bounds, which are either asymptotic or provably sub-optimal in this setting. Our lower bound incorporates both the stationary distribution and the transition structure induced by the unknown Markov chain. We further propose an optimal test whose expected stopping time matches this lower bound asymptotically as $α\to 0$. We illustrate the usefulness of our framework through applications to sequential detection of model misspecification in Markov Chain Monte Carlo and to testing structural properties, such as the linearity of transition dynamics, in Markov decision processes. Our findings yield a sharp and general characterization of optimal sequential testing procedures under Markovian dependence.
STApr 28, 2025
On Stopping Times of Power-one Sequential Tests: Tight Lower and Upper BoundsShubhada Agrawal, Aaditya Ramdas
We prove two lower bounds for stopping times of sequential tests between general composite nulls and alternatives. The first lower bound is for the setting where the type-1 error level $α$ approaches zero, and equals $\log(1/α)$ divided by a certain infimum KL divergence, termed $\operatorname{KL_{inf}}$. The second lower bound applies to the setting where $α$ is fixed and $\operatorname{KL_{inf}}$ approaches 0 (meaning that the null and alternative sets are not separated) and equals $c \operatorname{KL_{inf}}^{-1} \log \log \operatorname{KL_{inf}}^{-1}$ for a universal constant $c > 0$. We also provide a sufficient condition for matching the upper bounds and show that this condition is met in several special cases. Given past work, these upper and lower bounds are unsurprising in their form; our main contribution is the generality in which they hold, for example, not requiring reference measures or compactness of the classes.
LGDec 13, 2025
Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded DataShubhada Agrawal, Aaditya Ramdas
We prove that a classic sub-Gaussian mixture proposed by Robbins in a stochastic setting actually satisfies a path-wise (deterministic) regret bound. For every path in a natural ``Ville event'' $E_α$, this regret till time $T$ is bounded by $\ln^2(1/α)/V_T + \ln (1/α) + \ln \ln V_T$ up to universal constants, where $V_T$ is a nonnegative, nondecreasing, cumulative variance process. (The bound reduces to $\ln(1/α) + \ln \ln V_T$ if $V_T \geq \ln(1/α)$.) If the data were stochastic, then one can show that $E_α$ has probability at least $1-α$ under a wide class of distributions (eg: sub-Gaussian, symmetric, variance-bounded, etc.). In fact, we show that on the Ville event $E_0$ of probability one, the regret on every path in $E_0$ is eventually bounded by $\ln \ln V_T$ (up to constants). We explain how this work helps bridge the world of adversarial online learning (which usually deals with regret bounds for bounded data), with game-theoretic statistics (which can handle unbounded data, albeit using stochastic assumptions). In short, conditional regret bounds serve as a bridge between stochastic and adversarial betting.
LGMar 14, 2024
Optimal Top-Two Method for Best Arm Identification and Fluid AnalysisAgniv Bandyopadhyay, Sandeep Juneja, Shubhada Agrawal
Top-$2$ methods have become popular in solving the best arm identification (BAI) problem. The best arm, or the arm with the largest mean amongst finitely many, is identified through an algorithm that at any sequential step independently pulls the empirical best arm, with a fixed probability $β$, and pulls the best challenger arm otherwise. The probability of incorrect selection is guaranteed to lie below a specified $δ>0$. Information theoretic lower bounds on sample complexity are well known for BAI problem and are matched asymptotically as $δ\rightarrow 0$ by computationally demanding plug-in methods. The above top 2 algorithm for any $β\in (0,1)$ has sample complexity within a constant of the lower bound. However, determining the optimal $β$ that matches the lower bound has proven difficult. In this paper, we address this and propose an optimal top-2 type algorithm. We consider a function of allocations anchored at a threshold. If it exceeds the threshold then the algorithm samples the empirical best arm. Otherwise, it samples the challenger arm. We show that the proposed algorithm is optimal as $δ\rightarrow 0$. Our analysis relies on identifying a limiting fluid dynamics of allocations that satisfy a series of ordinary differential equations pasted together and that describe the asymptotic path followed by our algorithm. We rely on the implicit function theorem to show existence and uniqueness of these fluid ode's and to show that the proposed algorithm remains close to the ode solution.
LGFeb 7, 2021
Regret Minimization in Heavy-Tailed BanditsShubhada Agrawal, Sandeep Juneja, Wouter M. Koolen
We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+ε)$ are uniformly bounded by a known constant B, for some given $ε> 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.
LGAug 17, 2020
Optimal Best-Arm Identification Methods for Tail-Risk MeasuresShubhada Agrawal, Wouter M. Koolen, Sandeep Juneja
Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely many that has the smallest CVaR, VaR, or weighted sum of CVaR and mean. The latter captures the risk-return trade-off common in finance. Our main contribution is an optimal $δ$-correct algorithm that acts on general arms, including heavy-tailed distributions, and matches the lower bound on the expected number of samples needed, asymptotically (as $δ$ approaches $0$). The algorithm requires solving a non-convex optimization problem in the space of probability measures, that requires delicate analysis. En-route, we develop new non-asymptotic empirical likelihood-based concentration inequalities for tail-risk measures which are tighter than those for popular truncation-based empirical estimators.
LGAug 24, 2019
Optimal $δ$-Correct Best-Arm Selection for Heavy-Tailed DistributionsShubhada Agrawal, Sandeep Juneja, Peter Glynn
Given a finite set of unknown distributions or arms that can be sampled, we consider the problem of identifying the one with the maximum mean using a $δ$-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified $δ$) that has minimum sample complexity. Lower bounds for $δ$-correct algorithms are well known. $δ$-correct algorithms that match the lower bound asymptotically as $δ$ reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise, under a $δ$-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a $δ$-correct algorithm that matches the lower bound as $δ$ reduces to zero under the mild restriction that a known bound on the expectation of $(1+ε)^{th}$ moment of the underlying random variables exists, for $ε> 0$. We also propose batch processing and identify near-optimal batch sizes to speed up the proposed algorithm substantially. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well-studied classic problem in the simulation community.