LGCYDCJun 7, 2023

Fair Multi-Agent Bandits

arXiv:2306.04498v22 citationsh-index: 35
AI Analysis

This addresses fairness in distributed bandit learning for scenarios like resource allocation, though it is incremental as it builds on existing methods with a refined analysis.

The paper tackles the problem of fair multi-agent multi-arm bandit learning without direct communication, achieving a regret bound of O(N^3 log(B/Δ) f(log T) log T), which improves upon prior work by reducing exponential dependence on the number of agents to polynomial.

In this paper, we study the problem of fair multi-agent multi-arm bandit learning when agents do not communicate with each other, except collision information, provided to agents accessing the same arm simultaneously. We provide an algorithm with regret $O\left(N^3 \log \frac{B}Δ f(\log T) \log T \right)$ (assuming bounded rewards, with unknown bound), where $f(t)$ is any function diverging to infinity with $t$. This significantly improves previous results which had the same upper bound on the regret of order $O(f(\log T) \log T )$ but an exponential dependence on the number of agents. The result is attained by using a distributed auction algorithm to learn the sample-optimal matching and a novel order-statistics-based regret analysis. Simulation results present the dependence of the regret on $\log T$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes