CLFeb 23, 2025

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

arXiv:2502.16457v42 citationsh-index: 26CIKM
Originality Incremental advance
AI Analysis

This work provides a practical, data-driven resource for the materials science community to accelerate innovation through more efficient experimental design, though it is incremental as it builds on existing LLM methods applied to a new domain.

The authors tackled the problem of automating materials discovery by curating a dataset of 17K expert-verified synthesis recipes and developing AlchemyBench, a benchmark for LLM-based synthesis prediction, with their LLM-as-a-Judge framework showing strong statistical agreement with expert assessments.

Materials synthesis is vital for innovations such as energy storage, catalysis, electronics, and biomedical devices. Yet, the process relies heavily on empirical, trial-and-error methods guided by expert intuition. Our work aims to support the materials science community by providing a practical, data-driven resource. We have curated a comprehensive dataset of 17K expert-verified synthesis recipes from open-access literature, which forms the basis of our newly developed benchmark, AlchemyBench. AlchemyBench offers an end-to-end framework that supports research in large language models applied to synthesis prediction. It encompasses key tasks, including raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. We propose an LLM-as-a-Judge framework that leverages large language models for automated evaluation, demonstrating strong statistical agreement with expert assessments. Overall, our contributions offer a supportive foundation for exploring the capabilities of LLMs in predicting and guiding materials synthesis, ultimately paving the way for more efficient experimental design and accelerated innovation in materials science.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes