LG AI MLNov 15, 2023

Learning Fair Division from Bandit Feedback

Hakuei Yamada, Junpei Komiyama, Kenshi Abe, Atsushi Iwasaki

arXiv:2311.09068v112.310 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses fair resource allocation in dynamic, uncertain environments, such as online platforms, but is incremental as it builds on existing fair division and bandit learning frameworks.

The paper tackles the problem of online fair division under uncertainty, where a central planner allocates items sequentially without precise knowledge of agents' values, and shows that their algorithms asymptotically achieve optimal Nash social welfare in linear Fisher markets with additive utilities, supported by regret bounds and empirical validation on synthetic and empirical datasets.

This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities. Departing from conventional online algorithm, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing \textit{dual averaging}, enabling gradual learning of both the type distribution of arriving items and agents' values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We establish regret bounds in Nash social welfare and empirically validate the superior performance of our proposed algorithms across synthetic and empirical datasets.

View on arXiv PDF

Similar