GTAIJul 13, 2020

Fair Algorithms for Multi-Agent Multi-Armed Bandits

arXiv:2007.06699v265 citations
AI Analysis

This work addresses fairness in resource allocation for multiple agents with conflicting preferences, representing an incremental extension of bandit algorithms to incorporate economic fairness notions.

The paper tackles the problem of learning a fair distribution over arms in a multi-agent multi-armed bandit setting, where each agent has different reward perceptions, and shows that proposed variants of classic algorithms achieve sublinear regret in terms of lost Nash social welfare.

We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are $N$ agents and $K$ arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally. Instead, we seek to learn a fair distribution over the arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms and show that they achieve sublinear regret, which is now measured in terms of the lost Nash social welfare.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes