LGMLJun 6, 2020

Contextual Bandits with Side-Observations

arXiv:2006.03951v28 citations
Originality Incremental advance
AI Analysis

This addresses recommendation systems for users in social networks by leveraging side-observations to reduce regret, representing a domain-specific incremental improvement.

The paper tackles the problem of contextual bandits with side-observations in social networks to design recommendation algorithms, showing that their proposed algorithm achieves asymptotically optimal regret of O(|χ(G)| log T), compared to O(N log T) for naive methods.

We investigate contextual bandits in the presence of side-observations across arms in order to design recommendation algorithms for users connected via social networks. Users in social networks respond to their friends' activity, and hence provide information about each other's preferences. In our model, when a learning algorithm recommends an article to a user, not only does it observe his/her response (e.g. an ad click), but also the side-observations, i.e., the response of his neighbors if they were presented with the same article. We model these observation dependencies by a graph $\mathcal{G}$ in which nodes correspond to users, and edges correspond to social links. We derive a problem/instance-dependent lower-bound on the regret of any consistent algorithm. We propose an optimization (linear programming) based data-driven learning algorithm that utilizes the structure of $\mathcal{G}$ in order to make recommendations to users and show that it is asymptotically optimal, in the sense that its regret matches the lower-bound as the number of rounds $T\to\infty$. We show that this asymptotically optimal regret is upper-bounded as $O\left(|χ(\mathcal{G})|\log T\right)$, where $|χ(\mathcal{G})|$ is the domination number of $\mathcal{G}$. In contrast, a naive application of the existing learning algorithms results in $O\left(N\log T\right)$ regret, where $N$ is the number of users.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes