LGSep 27, 2025

CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning

Prashant Govindarajan, Mathieu Reymond, Antoine Clavaud, Mariano Phielipp, Santiago Miret, Sarath Chandar

arXiv:2509.23156v14.11 citationsh-index: 21Has Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark for reinforcement learning researchers and material scientists to address real-world design problems with practical applications, though it is incremental as it builds on existing RL methods applied to a specific domain.

The authors tackled the problem of accelerating materials discovery by introducing CrystalGym, a reinforcement learning environment that uses direct density functional theory (DFT) signals as feedback, and benchmarked algorithms for optimizing properties like band gap and bulk modulus, showing varying sample efficiencies and convergence but no algorithm solved all tasks.

In silico design and optimization of new materials primarily relies on high-accuracy atomic simulators that perform density functional theory (DFT) calculations. While recent works showcase the strong potential of machine learning to accelerate the material design process, they mostly consist of generative approaches that do not use direct DFT signals as feedback to improve training and generation mainly due to DFT's high computational cost. To aid the adoption of direct DFT signals in the materials design loop through online reinforcement learning (RL), we propose CrystalGym, an open-source RL environment for crystalline material discovery. Using CrystalGym, we benchmark common value- and policy-based reinforcement learning algorithms for designing various crystals conditioned on target properties. Concretely, we optimize for challenging properties like the band gap, bulk modulus, and density, which are directly calculated from DFT in the environment. While none of the algorithms we benchmark solve all CrystalGym tasks, our extensive experiments and ablations show different sample efficiencies and ease of convergence to optimality for different algorithms and environment settings. Additionally, we include a case study on the scope of fine-tuning large language models with reinforcement learning for improving DFT-based rewards. Our goal is for CrystalGym to serve as a test bed for reinforcement learning researchers and material scientists to address these real-world design problems with practical applications. We therefore introduce a novel class of challenges for reinforcement learning methods dealing with time-consuming reward signals, paving the way for future interdisciplinary research for machine learning motivated by real-world applications.

View on arXiv PDF

Similar