LGAIFeb 4

Mixture of Masters: Sparse Chess Language Models with Player Routing

arXiv:2602.04447v11 citationsh-index: 13
AI Analysis

This addresses the issue of homogenization in chess AI for players and researchers, offering improved performance and stylistic variety, though it is incremental as it builds on existing mixture-of-experts and language model techniques.

The paper tackled the problem of monolithic chess language models collapsing into mode-averaged behavior by introducing Mixture-of-Masters (MoM), a sparse mixture-of-experts model with GPT experts emulating grandmasters, which outperformed dense networks and GPT baselines against Stockfish on unseen games.

Modern chess language models are dense transformers trained on millions of games played by thousands of high-rated individuals. However, these monolithic networks tend to collapse into mode-averaged behavior, where stylistic boundaries are blurred, and rare but effective strategies are suppressed. To counteract homogenization, we introduce Mixture-of-Masters (MoM), the first chess mixture-of-experts model with small-sized GPT experts emulating world-class grandmasters. Each expert is trained with a combination of self-supervised learning and reinforcement learning guided by chess-specific rewards. For each move, a post-hoc learnable gating network selects the most appropriate persona to channel depending on the game state, allowing MoM to switch its style dynamically$--$e.g., Tal's offensive vocation or Petrosian's defensive solidity. When evaluated against Stockfish on unseen standard games, MoM outperforms both dense individual expert networks and popular GPT baselines trained on aggregated data, while ensuring generation variety, control, and interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes