LGMay 8, 2025

Bandit Max-Min Fair Allocation

Tsubasa Harada, Shinji Ito, Hanna Sumita

arXiv:2505.05169v17.11 citationsh-index: 16ECML/PKDD

Originality Incremental advance

AI Analysis

This addresses fair resource allocation in multi-agent systems with limited feedback, representing an incremental advance by extending existing methods to a new bandit setting.

The paper tackles the bandit max-min fair allocation problem by maximizing the minimum utility among agents with additive valuations under semi-bandit feedback, achieving an asymptotic regret bound of O(m√T ln T/n + m√(T ln(mnT))) and providing a lower bound of Ω(m√T/n).

In this paper, we study a new decision-making problem called the bandit max-min fair allocation (BMMFA) problem. The goal of this problem is to maximize the minimum utility among agents with additive valuations by repeatedly assigning indivisible goods to them. One key feature of this problem is that each agent's valuation for each item can only be observed through the semi-bandit feedback, while existing work supposes that the item values are provided at the beginning of each round. Another key feature is that the algorithm's reward function is not additive with respect to rounds, unlike most bandit-setting problems. Our first contribution is to propose an algorithm that has an asymptotic regret bound of $O(m\sqrt{T}\ln T/n + m\sqrt{T \ln(mnT)})$, where $n$ is the number of agents, $m$ is the number of items, and $T$ is the time horizon. This is based on a novel combination of bandit techniques and a resource allocation algorithm studied in the literature on competitive analysis. Our second contribution is to provide the regret lower bound of $Ω(m\sqrt{T}/n)$. When $T$ is sufficiently larger than $n$, the gap between the upper and lower bounds is a logarithmic factor of $T$.

View on arXiv PDF

Similar