The Evaluation of Rating Systems in Team-based Battle Royale Games
This work addresses the evaluation gap for rating systems in online competitive games, which is incremental as it focuses on improving assessment methods rather than the systems themselves.
The paper tackled the problem of evaluating rating systems in team-based battle royale games by testing several metrics on a dataset of over 25,000 matches, finding that normalized discounted cumulative gain (NDCG) performed more reliably and flexibly than others.
Online competitive games have become a mainstream entertainment platform. To create a fair and exciting experience, these games use rating systems to match players with similar skills. While there has been an increasing amount of research on improving the performance of these systems, less attention has been paid to how their performance is evaluated. In this paper, we explore the utility of several metrics for evaluating three popular rating systems on a real-world dataset of over 25,000 team battle royale matches. Our results suggest considerable differences in their evaluation patterns. Some metrics were highly impacted by the inclusion of new players. Many could not capture the real differences between certain groups of players. Among all metrics studied, normalized discounted cumulative gain (NDCG) demonstrated more reliable performance and more flexibility. It alleviated most of the challenges faced by the other metrics while adding the freedom to adjust the focus of the evaluations on different groups of players.