Mastering Chess with a Transformer Model
This work addresses the challenge of efficient chess AI for researchers and practitioners, showing domain-specific enhancements can reduce computational needs, though it is incremental in improving transformer applications.
The paper tackled the problem of applying transformer models to chess by focusing on position representation, resulting in a model that matches existing chess-playing models with significantly less computational cost, e.g., outperforming AlphaZero with 8x less computation and matching grandmaster-level agents with 30x less computation.
Transformer models have demonstrated impressive capabilities when trained at scale, excelling at difficult cognitive tasks requiring complex reasoning and rational decision-making. In this paper, we explore the application of transformers to chess, focusing on the critical role of the position representation within the attention mechanism. We show that transformers endowed with a sufficiently expressive position representation can match existing chess-playing models at a fraction of the computational cost. Our architecture, which we call the Chessformer, significantly outperforms AlphaZero in both playing strength and puzzle solving ability with 8x less computation and matches prior grandmaster-level transformer-based agents in those metrics with 30x less computation. Our models also display an understanding of chess dissimilar and orthogonal to that of top traditional engines, detecting high-level positional features like trapped pieces and fortresses that those engines struggle with. This work demonstrates that domain-specific enhancements can in large part replace the need for model scale, while also highlighting that deep learning can make strides even in areas dominated by search-based methods.