Balancing Adaptability and Non-exploitability in Repeated Games
This work addresses the challenge of balancing adaptability and fairness in multi-agent learning, providing the first guarantees for both regret and non-exploitability, which is foundational for applications like online platforms and game theory.
The paper tackles the problem of achieving low regret in repeated games against opponents of unknown types while ensuring non-exploitability, where the opponent lacks incentive to use exploitative strategies. It introduces the LAFF algorithm, which guarantees sublinear regret against non-exploitative opponents and ensures linear regret for exploitative ones.
We study the problem of guaranteeing low regret in repeated games against an opponent with unknown membership in one of several classes. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some "fair" value. Our solution is an expert algorithm (LAFF) that searches within a set of sub-algorithms that are optimal for each opponent class and uses a punishment policy upon detecting evidence of exploitation by the opponent. With benchmarks that depend on the opponent class, we show that LAFF has sublinear regret uniformly over the possible opponents, except exploitative ones, for which we guarantee that the opponent has linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.