MTRL-SCILGSep 25, 2025

Automated Machine Learning Pipeline for Training and Analysis Using Large Language Models

arXiv:2509.21647v13 citationsh-index: 74J Chem Theory Comput
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of automating MLIP development for molecular simulations, offering a domain-specific tool that is incremental in integrating large-language-model agents into an existing pipeline.

The authors tackled the difficulty of developing reliable machine learning interatomic potentials by introducing an Automated Machine Learning Pipeline (AMLP) that unifies the workflow from dataset creation to model validation, achieving mean absolute errors of ~1.7 meV/atom in energies and ~7.0 meV/Å in forces on acridine polymorphs.

Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality datasets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis), based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of ~1.7 meV/atom in energies and ~7.0 meV/Å in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-Å accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensembles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes