Ativ Joshi

3papers

15citations

Novelty67%

AI Score42

Ranked #87,930 of 201,326 authors (top 44%)#19,736 in LG (top 46%)

3 Papers

LGMar 11, 2023

No-regret Algorithms for Fair Resource Allocation

Abhishek Sinha, Ativ Joshi, Rajarshi Bhattacharjee et al.

We consider a fair resource allocation problem in the no-regret setting against an unrestricted adversary. The objective is to allocate resources equitably among several agents in an online fashion so that the difference of the aggregate $α$-fair utilities of the agents between an optimal static clairvoyant allocation and that of the online policy grows sub-linearly with time. The problem is challenging due to the non-additive nature of the $α$-fairness function. Previously, it was shown that no online policy can exist for this problem with a sublinear standard regret. In this paper, we propose an efficient online resource allocation policy, called Online Proportional Fair (OPF), that achieves $c_α$-approximate sublinear regret with the approximation factor $c_α=(1-α)^{-(1-α)}\leq 1.445,$ for $0\leq α< 1$. The upper bound to the $c_α$-regret for this problem exhibits a surprising phase transition phenomenon. The regret bound changes from a power-law to a constant at the critical exponent $α=\frac{1}{2}.$ As a corollary, our result also resolves an open problem raised by Even-Dar et al. [2009] on designing an efficient no-regret policy for the online job scheduling problem in certain parameter regimes. The proof of our results introduces new algorithmic and analytical techniques, including greedy estimation of the future gradients for non-additive global reward functions and bootstrapping adaptive regret bounds, which may be of independent interest.

97.2LGMar 11

Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control

Yaswanth Chittepu, Ativ Joshi, Rajarshi Bhattacharjee et al.

Safe Reinforcement Learning from Human Feedback (RLHF) typically enforces safety through expected cost constraints, but the expectation captures only a single statistic of the cost distribution and fails to account for distributional uncertainty, particularly under heavy tails or rare catastrophic events. This limitation is problematic when robustness and risk sensitivity are critical. Stochastic dominance offers a principled alternative by comparing entire cost distributions rather than just their averages, enabling direct control over tail risks and potential out-of-distribution failures that expectation-based constraints may overlook. In this work, we propose Risk-sensitive Alignment via Dominance (RAD), a novel alignment framework that replaces scalar expected cost constraints with First-Order Stochastic Dominance (FSD) constraints. We operationalize this constraint by comparing the target policy's cost distribution to that of a reference policy within an Optimal Transport (OT) framework, using entropic regularization and Sinkhorn iterations to obtain a differentiable and computationally efficient objective for stable end-to-end optimization. Furthermore, we introduce quantile-weighted FSD constraints and show that weighted FSD universally controls a broad class of Spectral Risk Measures (SRMs), so that improvements under weighted dominance imply guaranteed improvements in the corresponding spectral risk. This provides a principled mechanism for tuning a model's risk profile via the quantile weighting function. Empirical results demonstrate that RAD improves harmlessness over baselines while remaining competitive in helpfulness, and exhibits greater robustness on out-of-distribution harmlessness evaluations.

ITMay 10, 2022

Universal Caching

Ativ Joshi, Abhishek Sinha

In learning theory, the performance of an online policy is commonly measured in terms of the static regret metric, which compares the cumulative loss of an online policy to that of an optimal benchmark in hindsight. In the definition of static regret, the action of the benchmark policy remains fixed throughout the time horizon. Naturally, the resulting regret bounds become loose in non-stationary settings where fixed actions often suffer from poor performance. In this paper, we investigate a stronger notion of regret minimization in the context of online caching. In particular, we allow the action of the benchmark at any round to be decided by a finite state machine containing any number of states. Popular caching policies, such as LRU and FIFO, belong to this class. Using ideas from the universal prediction literature in information theory, we propose an efficient online caching policy with a sub-linear regret bound. To the best of our knowledge, this is the first data-dependent regret bound known for the caching problem in the universal setting. We establish this result by combining a recently-proposed online caching policy with an incremental parsing algorithm, namely Lempel-Ziv '78. Our methods also yield a simpler learning-theoretic proof of the improved regret bound as opposed to the involved problem-specific combinatorial arguments used in the earlier works.