Rishab Balasubramanian

h-index2

7papers

23citations

Novelty60%

AI Score48

Ranked #28,010 of 194,257 authors (top 14%)#6,656 in LG (top 17%)

7 Papers

5.3LGOct 8, 2023Code

Adversarial Attacks on Combinatorial Multi-Armed Bandits

Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli et al.

We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.

9.8LGJun 13, 2023Code

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Zeyu Zhang, Yi Su, Hui Yuan et al.

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this paper, we unified the ranking process under general stochastic click models as a Markov Decision Process (MDP), and the optimal ranking could be learned with offline reinforcement learning (RL) directly. Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models. Through a dedicated formulation of the MDP, we show that offline RL algorithms can adapt to various click models without complex debiasing techniques and prior knowledge of the model. Results on various large-scale datasets demonstrate that CUOLR consistently outperforms the state-of-the-art off-policy learning to rank algorithms while maintaining consistency and robustness under different click models.

27.6LGApr 7Code

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma et al.

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.

2.6CVAug 12, 2022Code

Contrastive Learning for Object Detection

Rishab Balasubramanian, Kunal Rathore

Contrastive learning is commonly used as a method of self-supervised learning with the "anchor" and "positive" being two random augmentations of a given input image, and the "negative" is the set of all other images. However, the requirement of large batch sizes and memory banks has made it difficult and slow to train. This has motivated the rise of Supervised Contrasative approaches that overcome these problems by using annotated data. We look to further improve supervised contrastive learning by ranking classes based on their similarity, and observe the impact of human bias (in the form of ranking) on the learned representations. We feel this is an important question to address, as learning good feature embeddings has been a long sought after problem in computer vision.

1.4CVAug 12, 2022Code

Contrastive Learning for OOD in Object detection

Rishab Balasubramanian, Rupashree Dey, Kunal Rathore

Contrastive learning is commonly applied to self-supervised learning, and has been shown to outperform traditional approaches such as the triplet loss and N-pair loss. However, the requirement of large batch sizes and memory banks has made it difficult and slow to train. Recently, Supervised Contrasative approaches have been developed to overcome these problems. They focus more on learning a good representation for each class individually, or between a cluster of classes. In this work we attempt to rank classes based on similarity using a user-defined ranking, to learn an efficient representation between all classes. We observe how incorporating human bias into the learning process could improve learning representations in the parameter space. We show that our results are comparable to Supervised Contrastive Learning for image classification and object detection, and discuss it's shortcomings in OOD Detection

9.8LGMay 30, 2023

Adversarial Attacks on Online Learning to Rank with Stochastic Click Models

Zichen Wang, Rishab Balasubramanian, Hui Yuan et al.

We propose the first study of adversarial attacks on online learning to rank. The goal of the adversary is to misguide the online learning to rank algorithm to place the target item on top of the ranking list linear times to time horizon $T$ with a sublinear attack cost. We propose generalized list poisoning attacks that perturb the ranking list presented to the user. This strategy can efficiently attack any no-regret ranker in general stochastic click models. Furthermore, we propose a click poisoning-based strategy named attack-then-quit that can efficiently attack two representative OLTR algorithms for stochastic click models. We theoretically analyze the success and cost upper bound of the two proposed methods. Experimental results based on synthetic and real-world data further validate the effectiveness and cost-efficiency of the proposed attack strategies.

7.0RONov 2, 2020Code

Risk-Aware Submodular Optimization for Multi-objective Travelling Salesperson Problem

Rishab Balasubramanian, Lifeng Zhou, Pratap Tokekar et al.

We introduce a risk-aware multi-objective Traveling Salesperson Problem (TSP) variant, where the robot tour cost and tour reward have to be optimized simultaneously. The robot obtains reward along the edges in the graph. We study the case where the rewards and the costs exhibit diminishing marginal gains, i.e., are submodular. Unlike prior work, we focus on the scenario where the costs and the rewards are uncertain and seek to maximize the Conditional-Value-at-Risk (CVaR) metric of the submodular function. We propose a risk-aware greedy algorithm (RAGA) to find a bounded-approximation algorithm. The approximation algorithm runs in polynomial time and is within a constant factor of the optimal and an additive term that depends on the optimal solution. We use the submodular function's curvature to improve approximation results further and verify the algorithm's performance through simulations.