MTRL-SCI LG MLJun 14, 2025

Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

Thorben Prein, Elton Pan, Janik Jehkul, Steffen Weinmann, Elsa A. Olivetti, Jennifer L. M. Rupp

arXiv:2506.12557v15.18 citationsh-index: 6ACS Applied Materials and Interfaces

Originality Incremental advance

AI Analysis

This work addresses the challenge of data scarcity in inorganic materials synthesis for researchers, offering a scalable and data-efficient hybrid approach that is incremental in combining language models with specialized training.

The paper tackles the problem of inorganic synthesis planning by using off-the-shelf language models to predict synthesis conditions, achieving up to 53.8% Top-1 precursor-prediction accuracy and mean absolute errors below 126°C for temperatures, and then employs these models to generate synthetic data that improves a specialized transformer model, SyntMTE, reducing errors to 73°C and 98°C and outperforming baselines by up to 8.7%.

Inorganic synthesis planning currently relies primarily on heuristic approaches or machine-learning models trained on limited datasets, which constrains its generality. We demonstrate that language models, without task-specific fine-tuning, can recall synthesis conditions. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash and Llama 4 Maverick, achieve a Top-1 precursor-prediction accuracy of up to 53.8 % and a Top-5 performance of 66.1 % on a held-out set of 1,000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126 °C, matching specialized regression methods. Ensembling these language models further enhances predictive accuracy and reduces inference cost per prediction by up to 70 %. We subsequently employ language models to generate 28,548 synthetic reaction recipes, which we combine with literature-mined examples to pretrain a transformer-based model, SyntMTE. After fine-tuning on the combined dataset, SyntMTE reduces mean-absolute error in sintering temperature prediction to 73 °C and in calcination temperature to 98 °C. This strategy improves models by up to 8.7 % compared with baselines trained exclusively on experimental data. Finally, in a case study on Li7La3Zr2O12 solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable, data-efficient inorganic synthesis planning.

View on arXiv PDF

Similar