Sourav Sahoo

LG
5papers
46citations
Novelty50%
AI Score27

5 Papers

LGSep 28, 2022
Online Subset Selection using $α$-Core with no Augmented Regret

Sourav Sahoo, Siddhant Chaudhary, Samrat Mukhopadhyay et al.

We revisit the classic problem of optimal subset selection in the online learning set-up. Assume that the set $[N]$ consists of $N$ distinct elements. On the $t$th round, an adversary chooses a monotone reward function $f_t: 2^{[N]} \to \mathbb{R}_+$ that assigns a non-negative reward to each subset of $[N].$ An online policy selects (perhaps randomly) a subset $S_t \subseteq [N]$ consisting of $k$ elements before the reward function $f_t$ for the $t$th round is revealed to the learner. As a consequence of its choice, the policy receives a reward of $f_t(S_t)$ on the $t$th round. Our goal is to design an online sequential subset selection policy to maximize the expected cumulative reward accumulated over a time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new polyhedral characterization of the reward functions called $α$-Core - a generalization of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called $α$-augmented regret. In this new metric, the performance of the online policy is compared with an unrestricted offline benchmark that can select all $N$ elements at every round. We show that a large class of reward functions, including submodular, can be efficiently optimized with the SCore policy. We also extend the proposed policy to the optimistic learning set-up where the learner has access to additional untrusted hints regarding the reward functions. Finally, we conclude the paper with a list of open problems.

DSJun 6, 2024
Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

Negin Golrezaei, Sourav Sahoo

We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer. The buyer aims to maximize their cumulative value over $T$ rounds while adhering to per-round return-on-investment (RoI) constraints in a strategic (or adversarial) environment. Using an $m$-uniform bidding format, the buyer submits $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$, with $m \ll M$ in practice, where $M$ denotes the maximum demand of the buyer. We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids. Despite the stringent requirement, we show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality. Though the subset size is $O(M^m)$, we design a polynomial-time learning algorithm that achieves sublinear regret, both in full-information and bandit settings, relative to the hindsight-optimal safe strategy. We assess the robustness of safe strategies against the hindsight-optimal strategy from a richer class. We define the richness ratio $α\in (0,1]$ as the minimum ratio of the value of the optimal safe strategy to that of the optimal strategy from richer class and construct hard instances showing the tightness of $α$. Our algorithm achieves $α$-approximate sublinear regret against these stronger benchmarks. Simulations on semi-synthetic auction data show that empirical richness ratios significantly outperform the theoretical worst-case bounds. The proposed safe strategies and learning algorithm extend naturally to more nuanced buyer and competitor models.

ITOct 15, 2021
$k\texttt{-experts}$ -- Online Policies and Fundamental Limits

Samrat Mukhopadhyay, Sourav Sahoo, Abhishek Sinha

We introduce the $\texttt{$k$-experts}$ problem - a generalization of the classic Prediction with Expert's Advice framework. Unlike the classic version, where the learner selects exactly one expert from a pool of $N$ experts at each round, in this problem, the learner can select a subset of $k$ experts at each round $(1\leq k\leq N)$. The reward obtained by the learner at each round is assumed to be a function of the $k$ selected experts. The primary objective is to design an online learning policy with a small regret. In this pursuit, we propose $\texttt{SAGE}$ ($\textbf{Sa}$mpled Hed$\textbf{ge}$) - a framework for designing efficient online learning policies by leveraging statistical sampling techniques. For a wide class of reward functions, we show that $\texttt{SAGE}$ either achieves the first sublinear regret guarantee or improves upon the existing ones. Furthermore, going beyond the notion of regret, we fully characterize the mistake bounds achievable by online learning policies for stable loss functions. We conclude the paper by establishing a tight regret lower bound for a variant of the $\texttt{$k$-experts}$ problem and carrying out experiments with standard datasets.

OCSep 25, 2021
Distributed Online Optimization with Byzantine Adversarial Agents

Sourav Sahoo, Anand Gokhale, Rachel Kalpana Kalaimani

We study the problem of non-constrained, discrete-time, online distributed optimization in a multi-agent system where some of the agents do not follow the prescribed update rule either due to failures or malicious intentions. None of the agents have prior information about the identities of the faulty agents and any agent can communicate only with its immediate neighbours. At each time step, a locally Lipschitz strongly convex cost function is revealed locally to all the agents and the non-faulty agents update their states using their local information and the information obtained from their neighbours. We measure the performance of the online algorithm by comparing it to its offline version, when the cost functions are known apriori. The difference between the same is termed as regret. Under sufficient conditions on the graph topology, the number and location of the adversaries, the defined regret grows sublinearly. We further conduct numerical experiments to validate our theoretical results.

LGNov 30, 2020
Multi-Modal Detection of Alzheimer's Disease from Speech and Text

Amish Mittal, Sourav Sahoo, Arnhav Datar et al.

Reliable detection of the prodromal stages of Alzheimer's disease (AD) remains difficult even today because, unlike other neurocognitive impairments, there is no definitive diagnosis of AD in vivo. In this context, existing research has shown that patients often develop language impairment even in mild AD conditions. We propose a multimodal deep learning method that utilizes speech and the corresponding transcript simultaneously to detect AD. For audio signals, the proposed audio-based network, a convolutional neural network (CNN) based model, predicts the diagnosis for multiple speech segments, which are combined for the final prediction. Similarly, we use contextual embedding extracted from BERT concatenated with a CNN-generated embedding for classifying the transcript. The individual predictions of the two models are then combined to make the final classification. We also perform experiments to analyze the model performance when Automated Speech Recognition (ASR) system generated transcripts are used instead of manual transcription in the text-based model. The proposed method achieves 85.3% 10-fold cross-validation accuracy when trained and evaluated on the Dementiabank Pitt corpus.