Arya Akhavan

ML
h-index14
7papers
65citations
Novelty51%
AI Score42

7 Papers

MLJun 7, 2022
Group Meritocratic Fairness in Linear Contextual Bandits

Riccardo Grazzi, Arya Akhavan, John Isak Texas Falk et al.

We study the linear contextual bandit problem where an agent has to select one candidate from a pool and each candidate belongs to a sensitive group. In this setting, candidates' rewards may not be directly comparable between groups, for example when the agent is an employer hiring candidates from different ethnic groups and some groups have a lower reward due to discriminatory bias and/or social injustice. We propose a notion of fairness that states that the agent's policy is fair when it selects a candidate with highest relative rank, which measures how good the reward is when compared to candidates from the same group. This is a very strong notion of fairness, since the relative rank is not directly observed by the agent and depends on the underlying reward model and on the distribution of rewards. Thus we study the problem of learning a policy which approximates a fair policy under the condition that the contexts are independent between groups and the distribution of rewards of each group is absolutely continuous. In particular, we design a greedy policy which at each round constructs a ridge regression estimate from the observed context-reward pairs, and then computes an estimate of the relative rank of each candidate using the empirical cumulative distribution function. We prove that, despite its simplicity and the lack of an initial exploration phase, the greedy policy achieves, up to log factors and with high probability, a fair pseudo-regret of order $\sqrt{dT}$ after $T$ rounds, where $d$ is the dimension of the context vectors. The policy also satisfies demographic parity at each round when averaged over all possible information available before the selection. Finally, we use simulated settings and experiments on the US census data to show that our policy achieves sub-linear fair pseudo-regret also in practice.

MLJun 3, 2025
Non-stationary Bandit Convex Optimization: A Comprehensive Study

Xiaoqi Liu, Dorian Baudry, Julian Zimmert et al.

Bandit Convex Optimization is a fundamental class of sequential decision-making problems, where the learner selects actions from a continuous domain and observes a loss (but not its gradient) at only one point per round. We study this problem in non-stationary environments, and aim to minimize the regret under three standard measures of non-stationarity: the number of switches $S$ in the comparator sequence, the total variation $Δ$ of the loss functions, and the path-length $P$ of the comparator sequence. We propose a polynomial-time algorithm, Tilted Exponentially Weighted Average with Sleeping Experts (TEWA-SE), which adapts the sleeping experts framework from online convex optimization to the bandit setting. For strongly convex losses, we prove that TEWA-SE is minimax-optimal with respect to known $S$ and $Δ$ by establishing matching upper and lower bounds. By equipping TEWA-SE with the Bandit-over-Bandit framework, we extend our analysis to environments with unknown non-stationarity measures. For general convex losses, we introduce a second algorithm, clipped Exploration by Optimization (cExO), based on exponential weights over a discretized action space. While not polynomial-time computable, this method achieves minimax-optimal regret with respect to known $S$ and $Δ$, and improves on the best existing bounds with respect to $P$.

MLOct 28, 2025
Self-Concordant Perturbations for Linear Bandits

Lucas Lévy, Jean-Lou Valeau, Arya Akhavan et al.

We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(d\sqrt{n \ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $\sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.

LGSep 25, 2025
High-Probability Analysis of Online and Federated Zero-Order Optimisation

Arya Akhavan, David Janz, El-Mahdi El-Mhamdi

We study distributed learning in the context of gradient-free zero-order optimisation and introduce FedZero, a federated zero-order algorithm with sharp theoretical guarantees. Our contributions are threefold. First, in the federated convex setting, we derive high-probability guarantees for regret minimisation achieved by FedZero. Second, in the single-worker regime, corresponding to the classical zero-order framework with two-point feedback, we establish the first high-probability convergence guarantees for convex zero-order optimisation, strengthening previous results that held only in expectation. Third, to establish these guarantees, we develop novel concentration tools: (i) concentration inequalities with explicit constants for Lipschitz functions under the uniform measure on the $\ell_1$-sphere, and (ii) a time-uniform concentration inequality for squared sub-Gamma random variables. These probabilistic results underpin our high-probability guarantees and may also be of independent interest.

MLMar 3, 2025
Gradient-free stochastic optimization for additive models

Arya Akhavan, Alexandre B. Tsybakov

We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Łojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the Hölder family of functions. The additive model for Hölder classes of functions is well-studied in the literature on nonparametric function estimation, where it is shown that such a model benefits from a substantial improvement of the estimation accuracy compared to the Hölder model without additive structure. We study this established framework in the context of gradient-free optimization. We propose a randomized gradient estimator that, when plugged into a gradient descent algorithm, allows one to achieve minimax optimal optimization error of the order $dT^{-(β-1)/β}$, where $d$ is the dimension of the problem, $T$ is the number of queries and $β\ge 2$ is the Hölder degree of smoothness. We conclude that, in contrast to nonparametric estimation problems, no substantial gain of accuracy can be achieved when using additive models in gradient-free optimization.

MLJun 9, 2024
A conversion theorem and minimax optimality for continuum contextual bandits

Arya Akhavan, Karim Lounici, Massimiliano Pontil et al.

We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are $γ$-Hölder with respect to the contexts, $0<γ\le 1,$ we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret. We prove a static-to-contextual regret conversion theorem that provides an upper bound for the contextual regret of the output algorithm as a function of the static regret of the input algorithm. We further study the implications of this general result for three fundamental cases of dependency of the objective function on the action variable: (a) Lipschitz bandits, (b) convex bandits, (c) strongly convex and smooth bandits. For Lipschitz bandits and $γ=1,$ combining our results with the lower bound of Slivkins (2014), we prove that the minimax optimal contextual regret for the noise-free adversarial setting is achieved. Then, we prove that in the presence of noise, the contextual regret rate as a function of the number of queries is the same for convex bandits as it is for strongly convex and smooth bandits. Lastly, we present a minimax lower bound, implying two key facts. First, obtaining a sub-linear contextual regret may be impossible over functions that are not continuous with respect to the context. Second, for convex bandits and strongly convex and smooth bandits, the algorithms that we propose achieve, up to a logarithmic factor, the minimax optimal rate of contextual regret as a function of the number of queries.

LGJun 14, 2020
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits

Arya Akhavan, Massimiliano Pontil, Alexandre B. Tsybakov

We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. Based on this algorithm, we also propose an estimator of the minimum value of the function achieving almost sharp oracle behavior. We compare our results with the state-of-the-art, highlighting a number of key improvements.