Large-scale Analysis of Chess Games with Chess Engines: A Preliminary Report
This provides a dataset for applications like cheating detection and skill assessment, but it is incremental as it focuses on data collection and scalability rather than new methods.
The authors tackled the challenge of analyzing large-scale chess games by processing almost 5 million games and 270 million positions using the Stockfish engine, generating over 1 terabyte of evaluation data that would take an estimated 50 years on a single machine.
The strength of chess engines together with the availability of numerous chess games have attracted the attention of chess players, data scientists, and researchers during the last decades. State-of-the-art engines now provide an authoritative judgement that can be used in many applications like cheating detection, intrinsic ratings computation, skill assessment, or the study of human decision-making. A key issue for the research community is to gather a large dataset of chess games together with the judgement of chess engines. Unfortunately the analysis of each move takes lots of times. In this paper, we report our effort to analyse almost 5 millions chess games with a computing grid. During summer 2015, we processed 270 millions unique played positions using the Stockfish engine with a quite high depth (20). We populated a database of 1+ tera-octets of chess evaluations, representing an estimated time of 50 years of computation on a single machine. Our effort is a first step towards the replication of research results, the supply of open data and procedures for exploring new directions, and the investigation of software engineering/scalability issues when computing billions of moves.