Mario Bravo

GT
h-index3
5papers
209citations
Novelty53%
AI Score31

5 Papers

MLJan 7, 2025
Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mario Bravo, Juan P. Flores-Mella, Cristóbal Guzmán

We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA which are, in some important cases, dimension-free and poly-logarithmic on the accuracy, closely matching the existing results in the smooth convex case. Additionally, we establish new upper bounds for the privacy curve of the subsampled noisy SGD algorithm. These bounds show a crucial dependency on the regularity of gradients, and are useful for a wide range of convex losses beyond the smooth case. Our analysis relies on a suitable extension of the Privacy Amplification by Iteration (PABI) framework (Feldman et al., 2018; Altschuler and Talwar, 2022, 2023) to noisy iterations whose gradient map is not necessarily nonexpansive. This extension is achieved by designing an optimization problem which accounts for the best possible Rényi divergence bound obtained by an application of PABI, where the tractability of the problem is crucially related to the modulus of continuity of the associated gradient mapping. We show that, in several interesting cases -- namely the nonsmooth convex, weakly smooth and (strongly) dissipative -- such optimization problem can be solved exactly and explicitly, yielding the tightest possible PABI-based bounds.

OCMar 19, 2024
Stochastic Halpern iteration in normed spaces and applications to reinforcement learning

Mario Bravo, Juan Pablo Contreras

We analyze the oracle complexity of the stochastic Halpern iteration with minibatch, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle has uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon^{-5})$, to obtain $\varepsilon$ expected fixed-point residual for nonexpansive operators, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $Ω(\varepsilon^{-3})$ which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon^{-2}(1-γ)^{-3})$ complexity bound in the case in which the operator is a $γ$-contraction to obtain an approximation of the fixed-point. As an application, we propose new model-free algorithms for average and discounted reward MDPs. For the average reward case, our method applies to weakly communicating MDPs without requiring prior parameter knowledge.

GTOct 3, 2018
Bandit learning in concave $N$-person games

Mario Bravo, David S. Leslie, Panayotis Mertikopoulos

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability $1$. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

OCDec 20, 2014
On the robustness of learning in games with stochastically perturbed payoff observations

Mario Bravo, Panayotis Mertikopoulos

Motivated by the scarcity of accurate payoff feedback in practical applications of game theory, we examine a class of learning dynamics where players adjust their choices based on past payoff observations that are subject to noise and random disturbances. First, in the single-player case (corresponding to an agent trying to adapt to an arbitrarily changing environment), we show that the stochastic dynamics under study lead to no regret almost surely, irrespective of the noise level in the player's observations. In the multi-player case, we find that dominated strategies become extinct and we show that strict Nash equilibria are stochastically stable and attracting; conversely, if a state is stable or attracting with positive probability, then it is a Nash equilibrium. Finally, we provide an averaging principle for 2-player games, and we show that in zero-sum games with an interior equilibrium, time averages converge to Nash equilibrium for any noise level.

GTJun 12, 2013
Reinforcement learning with restrictions on the action set

Mario Bravo, Mathieu Faure

Consider a 2-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function, and have no information on the other player. Furthermore, we assume that they have restrictions on their own action set such that, at each stage, their choice is limited to a subset of their action set. We prove that the empirical distributions of play converge to the set of Nash equilibria for zero-sum and potential games, and games where one player has two actions.