Vijeth Hebbar

LG
h-index2
3papers
2citations
Novelty48%
AI Score39

3 Papers

46.5OCApr 20
Steady-state Based Approach to Online Non-stochastic Control

Vijeth Hebbar, Spencer Hutchinson, Mahnoosh Alizadeh et al.

We study the problem of online non-stochastic control (ONC), which is the control of a linear system under adversarial disturbances and adversarial cost functions, with the aim of minimizing the total cost incurred. A recent line of literature in ONC develops algorithms that enjoy sublinear regret with respect to a benchmark based on the set of steady-states that are attainable by a constant input. In this work, we extend this research direction by giving an algorithm that enjoys $\mathcal{O}(\sqrt{T})$ regret with respect to a richer benchmark set, namely the set of steady-states attainable under an \emph{affine controller}. Since this benchmark substantially broadens the comparison class, it provides significantly stronger performance guarantees. Our proposed algorithm combines a Follow-The-Perturbed-Leader-style online non-convex optimization approach with a batching method that maintains stability despite changing policies. Although our proposed algorithm requires solving non-convex subproblems, we show that an approximate solution to this subproblem is sufficient to ensure $\mathcal{O}(\sqrt{T})$ regret. Furthermore, numerical experiments show that our algorithm enjoys lower total cost and similar computation to existing methods in certain settings.

LGDec 11, 2023
Online Decision Making with History-Average Dependent Costs (Extended)

Vijeth Hebbar, Cedric Langbort

In many online sequential decision-making scenarios, a learner's choices affect not just their current costs but also the future ones. In this work, we look at one particular case of such a situation where the costs depend on the time average of past decisions over a history horizon. We first recast this problem with history dependent costs as a problem of decision making under stage-wise constraints. To tackle this, we then propose the novel Follow-The-Adaptively-Regularized-Leader (FTARL) algorithm. Our innovative algorithm incorporates adaptive regularizers that depend explicitly on past decisions, allowing us to enforce stage-wise constraints while simultaneously enabling us to establish tight regret bounds. We also discuss the implications of the length of history horizon on design of no-regret algorithms for our problem and present impossibility results when it is the full learning horizon.

LGJun 11, 2025
"What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)

Noel Brindise, Vijeth Hebbar, Riya Shah et al.

In this work, we provide an extended discussion of a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA), first proposed at L4DC 2025. DNA seeks a set of reasonable "options" for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to "explain" an agent's options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different "options" in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity. Beyond the explanatory motivation, this work opens new possibilities for exploration and adaptive planning in RL.