AIGTApr 23, 2021

Elo Ratings for Large Tournaments of Software Agents in Asymmetric Games

arXiv:2105.00839v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of fair performance evaluation for AI agents in complex, asymmetric games, which is incremental as it modifies an existing rating system.

The paper tackles the challenge of adapting the Elo rating system for evaluating AI agents in asymmetric games, presenting a revised system and tournament guidelines to address differences like extensive training and lack of prior information.

The Elo rating system has been used world wide for individual sports and team sports, as exemplified by the European Go Federation (EGF), International Chess Federation (FIDE), International Federation of Association Football (FIFA), and many others. To evaluate the performance of artificial intelligence agents, it is natural to evaluate them on the same Elo scale as humans, such as the rating of 5185 attributed to AlphaGo Zero. There are several fundamental differences between humans and AI that suggest modifications to the system, which in turn require revisiting Elo's fundamental rationale. AI is typically trained on many more games than humans play, and we have little a-priori information on newly created AI agents. Further, AI is being extended into games which are asymmetric between the players, and which could even have large complex boards with different setup in every game, such as commercial paper strategy games. We present a revised rating system, and guidelines for tournaments, to reflect these differences.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes