Sushant Vijayan

h-index1

4papers

7citations

Novelty65%

AI Score38

Ranked #85,782 of 194,257 authors (top 44%)#19,030 in LG (top 47%)

4 Papers

2.0LGMar 14, 2023

Best arm identification in rare events

Anirban Bhattacharjee, Sushant Vijayan, Sandeep K Juneja

We consider the best arm identification problem in the stochastic multi-armed bandit framework where each arm has a tiny probability of realizing large rewards while with overwhelming probability the reward is zero. A key application of this framework is in online advertising where click rates of advertisements could be a fraction of a single percent and final conversion to sales, while highly profitable, may again be a small fraction of the click rates. Lately, algorithms for BAI problems have been developed that minimise sample complexity while providing statistical guarantees on the correct arm selection. As we observe, these algorithms can be computationally prohibitive. We exploit the fact that the reward process for each arm is well approximated by a Compound Poisson process to arrive at algorithms that are faster, with a small increase in sample complexity. We analyze the problem in an asymptotic regime as rarity of reward occurrence reduces to zero, and reward amounts increase to infinity. This helps illustrate the benefits of the proposed algorithm. It also sheds light on the underlying structure of the optimal BAI algorithms in the rare event setting.

9.4LGMar 5, 2025

Online Bidding under RoS Constraints without Knowing the Value

Sushant Vijayan, Zhe Feng, Swati Padmanabhan et al.

We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend (RoS) constraints. Unlike prior work that assumes knowledge of the value generated by winning each impression ({e.g.,} conversions), we address the more realistic setting where the advertiser must simultaneously learn the optimal bidding strategy and the value of each impression opportunity. This introduces a challenging exploration-exploitation dilemma: the advertiser must balance exploring different bids to estimate impression values with exploiting current knowledge to bid effectively. To address this, we propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off. Via a rigorous theoretical analysis, we prove that our algorithm achieves $\widetilde{O}(\sqrt{T\log(|\mathcal{B}|T)})$ regret and constraint violation, where $T$ is the number of bidding rounds and $\mathcal{B}$ is the domain of possible bids. This establishes the first optimal regret and constraint violation bounds for bidding in the online setting with unknown impression values. Moreover, our algorithm is computationally efficient and simple to implement. We validate our theoretical findings through experiments on synthetic data, demonstrating that our algorithm exhibits strong empirical performance compared to existing approaches.

7.1LGAug 11, 2025

Regret minimization in Linear Bandits with offline data via extended D-optimal exploration

Sushant Vijayan, Arun Suggala, Karthikeyan Shanmugam et al.

We consider the problem of online regret minimization in linear bandits with access to prior observations (offline data) from the underlying bandit model. There are numerous applications where extensive offline data is often available, such as in recommendation systems, online advertising. Consequently, this problem has been studied intensively in recent literature. Our algorithm, Offline-Online Phased Elimination (OOPE), effectively incorporates the offline data to substantially reduce the online regret compared to prior work. To leverage offline information prudently, OOPE uses an extended D-optimal design within each exploration phase. OOPE achieves an online regret is $\tilde{O}(\sqrt{\deff T \log \left(|\mathcal{A}|T\right)}+d^2)$. $\deff \leq d)$ is the effective problem dimension which measures the number of poorly explored directions in offline data and depends on the eigen-spectrum $(λ_k)_{k \in [d]}$ of the Gram matrix of the offline data. The eigen-spectrum $(λ_k)_{k \in [d]}$ is a quantitative measure of the \emph{quality} of offline data. If the offline data is poorly explored ($\deff \approx d$), we recover the established regret bounds for purely online setting while, when offline data is abundant ($\Toff >> T$) and well-explored ($\deff = o(1) $), the online regret reduces substantially. Additionally, we provide the first known minimax regret lower bounds in this setting that depend explicitly on the quality of the offline data. These lower bounds establish the optimality of our algorithm in regimes where offline data is either well-explored or poorly explored. Finally, by using a Frank-Wolfe approximation to the extended optimal design we further improve the $O(d^{2})$ term to $O\left(\frac{d^{2}}{\deff} \min \{ \deff,1\} \right)$, which can be substantial in high dimensions with moderate quality of offline data $\deff = Ω(1)$.

4.1LGApr 26, 2025

Towards minimax optimal algorithms for Active Simple Hypothesis Testing

Sushant Vijayan

We study the Active Simple Hypothesis Testing (ASHT) problem, a simpler variant of the Fixed Budget Best Arm Identification problem. In this work, we provide novel game theoretic formulation of the upper bounds of the ASHT problem. This formulation allows us to leverage tools of differential games and Partial Differential Equations (PDEs) to propose an approximately optimal algorithm that is computationally tractable compared to prior work. However, the optimal algorithm still suffers from a curse of dimensionality and instead we use a novel link to Blackwell Approachability to propose an algorithm that is far more efficient computationally. We show that this new algorithm, although not proven to be optimal, is always better than static algorithms in all instances of ASHT and is numerically observed to attain the optimal exponent in various instances.