LG SIApr 14, 2017

Lean From Thy Neighbor: Stochastic & Adversarial Bandits in a Network

arXiv:1704.04470v11.41 citations

Originality Incremental advance

AI Analysis

This addresses decision-making in networked environments for applications in sociology and economics, with incremental improvements over prior bandit algorithms.

The paper tackles the problem of how individuals in a social network can leverage observations of neighbors' actions and rewards to improve decision-making in multi-armed bandit settings, showing that their algorithms achieve regret that interpolates between classical and full-information bounds and outperform existing methods in simulations.

An individual's decisions are often guided by those of his or her peers, i.e., neighbors in a social network. Presumably, being privy to the experiences of others aids in learning and decision making, but how much advantage does an individual gain by observing her neighbors? Such problems make appearances in sociology and economics and, in this paper, we present a novel model to capture such decision-making processes and appeal to the classical multi-armed bandit framework to analyze it. Each individual, in addition to her own actions, can observe the actions and rewards obtained by her neighbors, and can use all of this information in order to minimize her own regret. We provide algorithms for this setting, both for stochastic and adversarial bandits, and show that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors' exploration. In the stochastic setting the additional information must simply be incorporated into the usual estimation of the rewards, while in the adversarial setting this is attained by constructing a new unbiased estimator for the rewards and appropriately bounding the amount of additional information provided by the neighbors. These algorithms are optimal up to log factors; despite the fact that the agents act independently and selfishly, this implies that it is an approximate Nash equilibria for all agents to use our algorithms. Further, we show via empirical simulations that our algorithms, often significantly, outperform existing algorithms that one could apply to this setting.

View on arXiv PDF

Similar