LG AI MLNov 29, 2018

Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal

arXiv:1811.11925v27.511 citations

Originality Highly original

AI Analysis

This addresses the challenge of efficient exploration-exploitation in applications like Social Influence Maximization, representing an incremental advance by extending combinatorial bandits to handle non-linear feedback.

The paper tackles the problem of selecting the best K out of N options in combinatorial bandits with non-linear feedback, proposing a computationally efficient algorithm with linear storage that achieves a sub-linear regret bound of ˜O(K^{1/2}N^{1/3}T^{2/3}).

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$. %When applied to the problem of Social Influence Maximization, the performance of the proposed algorithm surpasses the UCB algorithm and some more sophisticated domain-specific methods.

View on arXiv PDF

Similar