AIAug 30, 2024

Strategic Arms with Side Communication Prevail Over Low-Regret MAB Algorithms

Ahmed Ben Yahmed, Clément Calauzènes, Vianney Perchet

Peking U

arXiv:2408.17101v18.515 citationsh-index: 19

Originality Incremental advance

AI Analysis

This addresses a problem in game theory and online learning for scenarios with strategic agents, but it appears incremental as it extends prior work on perfect information to side communication.

The paper tackles the problem of strategic multi-armed bandits where arms share information, showing that even without public information, arms can establish an equilibrium where they retain most value and cause linear regret for the player, with the key challenge being truthful communication protocol design.

In the strategic multi-armed bandit setting, when arms possess perfect information about the player's behavior, they can establish an equilibrium where: 1. they retain almost all of their value, 2. they leave the player with a substantial (linear) regret. This study illustrates that, even if complete information is not publicly available to all arms but is shared among them, it is possible to achieve a similar equilibrium. The primary challenge lies in designing a communication protocol that incentivizes the arms to communicate truthfully.

View on arXiv PDF

Similar