LG AI CY MAFeb 8, 2024

Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic Bandits

Subham Pokhriyal, Shweta Jain, Ganesh Ghalme, Swapnil Dhamal, Sujit Gujar

arXiv:2402.05575v14.62 citationsh-index: 24Has CodeAAMAS

Originality Incremental advance

AI Analysis

This addresses fairness in decision-making for applications like recommendation systems, though it is incremental as it builds on existing UCB-based methods.

The paper tackles the problem of ensuring fairness in stochastic multi-armed bandits by proposing Bi-Level Fairness, which guarantees minimum exposure to groups and meritocratic fairness within groups, achieving an O(√T) regret bound and showing improved exposure guarantees without significant reward loss compared to UCB.

Existing approaches to fairness in stochastic multi-armed bandits (MAB) primarily focus on exposure guarantee to individual arms. When arms are naturally grouped by certain attribute(s), we propose Bi-Level Fairness, which considers two levels of fairness. At the first level, Bi-Level Fairness guarantees a certain minimum exposure to each group. To address the unbalanced allocation of pulls to individual arms within a group, we consider meritocratic fairness at the second level, which ensures that each arm is pulled according to its merit within the group. Our work shows that we can adapt a UCB-based algorithm to achieve a Bi-Level Fairness by providing (i) anytime Group Exposure Fairness guarantees and (ii) ensuring individual-level Meritocratic Fairness within each group. We first show that one can decompose regret bounds into two components: (a) regret due to anytime group exposure fairness and (b) regret due to meritocratic fairness within each group. Our proposed algorithm BF-UCB balances these two regrets optimally to achieve the upper bound of $O(\sqrt{T})$ on regret; $T$ being the stopping time. With the help of simulated experiments, we further show that BF-UCB achieves sub-linear regret; provides better group and individual exposure guarantees compared to existing algorithms; and does not result in a significant drop in reward with respect to UCB algorithm, which does not impose any fairness constraint.

View on arXiv PDF Code

Similar