Hierarchies of No-regret Algorithms
This work clarifies the practical implications of regret hierarchy in game theory for algorithm designers and players, revealing that stronger theoretical guarantees do not always translate to better performance.
The paper investigates whether stronger regret guarantees (e.g., no-swap-regret) yield higher utility for players in two-player games compared to weaker ones (e.g., no-regret). Counterintuitively, no-swap-regret often performs worse due to slower learning rates, but equalizing learning rates closes the gap; however, in random games with 7 actions, no-swap-regret can outperform no-regret.
Our paper studies the setting of players using no-regret algorithms in various two-player games. We address whether having stronger regret guarantees or playing against an opponent with weaker regret guarantees yields higher utilities for the player in question. We consider a hierarchy of algorithms from weakest to strongest: uniform random play, no-regret, and no-swap-regret. We find, counterintuitively, that in many games, no-swap-regret is a worse choice for players (and gives better utility for their opponents). We find the root cause of this phenomenon to be a difference in effective learning rate between the two algorithms, where the no-swap-regret algorithms learn $N$ times slower than no-regret algorithms. To address this, we attempt to equalize learning rates, leading to closer utility between no-regret and no-swap-regret players. Finally, we show that for certain random games with $7$ actions per player, no-swap-regret algorithms can perform noticeably better than no-regret algorithms in a manner that cannot be explained away by unfairly adjusted learning rates.