MTRL-SCILGJul 9, 2025

Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning

arXiv:2507.07293v11 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the challenge of high experimental workload and knowledge expansion in chemistry and materials science, offering an incremental improvement through automated data extraction and prediction.

The paper tackled the problem of accelerating chemistry and materials science research by using large language models (LLMs) to automatically extract thermodynamic data from literature and training a machine learning model to predict thermodynamic parameters like enthalpy of formation, achieving accurate predictions as demonstrated.

New discoveries in chemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). Our LLM-based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine-readable structure, including stability constants for metal cation-ligand interactions, thermodynamic properties, and other broader data types (medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes