Sourav Sahoo

1.8LGSep 28, 2022

Online Subset Selection using $α$-Core with no Augmented Regret

Sourav Sahoo, Siddhant Chaudhary, Samrat Mukhopadhyay et al.

We revisit the classic problem of optimal subset selection in the online learning set-up. Assume that the set $[N]$ consists of $N$ distinct elements. On the $t$th round, an adversary chooses a monotone reward function $f_t: 2^{[N]} \to \mathbb{R}_+$ that assigns a non-negative reward to each subset of $[N].$ An online policy selects (perhaps randomly) a subset $S_t \subseteq [N]$ consisting of $k$ elements before the reward function $f_t$ for the $t$th round is revealed to the learner. As a consequence of its choice, the policy receives a reward of $f_t(S_t)$ on the $t$th round. Our goal is to design an online sequential subset selection policy to maximize the expected cumulative reward accumulated over a time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new polyhedral characterization of the reward functions called $α$-Core - a generalization of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called $α$-augmented regret. In this new metric, the performance of the online policy is compared with an unrestricted offline benchmark that can select all $N$ elements at every round. We show that a large class of reward functions, including submodular, can be efficiently optimized with the SCore policy. We also extend the proposed policy to the optimistic learning set-up where the learner has access to additional untrusted hints regarding the reward functions. Finally, we conclude the paper with a list of open problems.

5.6OCSep 25, 2021

Distributed Online Optimization with Byzantine Adversarial Agents

Sourav Sahoo, Anand Gokhale, Rachel Kalpana Kalaimani

We study the problem of non-constrained, discrete-time, online distributed optimization in a multi-agent system where some of the agents do not follow the prescribed update rule either due to failures or malicious intentions. None of the agents have prior information about the identities of the faulty agents and any agent can communicate only with its immediate neighbours. At each time step, a locally Lipschitz strongly convex cost function is revealed locally to all the agents and the non-faulty agents update their states using their local information and the information obtained from their neighbours. We measure the performance of the online algorithm by comparing it to its offline version, when the cost functions are known apriori. The difference between the same is termed as regret. Under sufficient conditions on the graph topology, the number and location of the adversaries, the defined regret grows sublinearly. We further conduct numerical experiments to validate our theoretical results.

Sourav Sahoo

2 Papers