COMP-PHLGJul 24, 2023

Synthetic pre-training for neural-network interatomic potentials

arXiv:2307.15714v131 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses the data dependency problem in atomistic materials modelling, offering a method to reduce reliance on costly quantum-mechanical data, though it appears incremental as it builds on existing synthetic data ideas from other ML areas.

The paper tackles the challenge of training machine learning interatomic potentials by proposing synthetic data pre-training, showing that pre-training with large synthetic datasets improves accuracy and stability when fine-tuned on smaller quantum-mechanical data, as demonstrated with equivariant graph-neural-network potentials for carbon.

Machine learning (ML) based interatomic potentials have transformed the field of atomistic materials modelling. However, ML potentials depend critically on the quality and quantity of quantum-mechanical reference data with which they are trained, and therefore developing datasets and training pipelines is becoming an increasingly central challenge. Leveraging the idea of "synthetic" (artificial) data that is common in other areas of ML research, we here show that synthetic atomistic data, themselves obtained at scale with an existing ML potential, constitute a useful pre-training task for neural-network interatomic potential models. Once pre-trained with a large synthetic dataset, these models can be fine-tuned on a much smaller, quantum-mechanical one, improving numerical accuracy and stability in computational practice. We demonstrate feasibility for a series of equivariant graph-neural-network potentials for carbon, and we carry out initial experiments to test the limits of the approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes