CHEM-PHMTRL-SCILGOct 12, 2024

Many-body Expansion Based Machine Learning Models for Octahedral Transition Metal Complexes

arXiv:2410.09659v12 citationsh-index: 38Machine Learning: Science and Technology
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurately modeling stereoisomers in coordination complexes for materials screening, representing an incremental improvement with specific gains in error reduction.

The researchers tackled the problem of distinguishing stereoisomers in octahedral transition metal complexes using graph-based machine learning models by introducing a many-body expansion-based modification to molecular graph featurization, achieving a 30-40% reduction in error on spin-splitting energies and frontier orbital energy gaps compared to previous methods.

Graph-based machine learning models for materials properties show great potential to accelerate virtual high-throughput screening of large chemical spaces. However, in their simplest forms, graph-based models do not include any 3D information and are unable to distinguish stereoisomers such as those arising from different orderings of ligands around a metal center in coordination complexes. In this work we present a modification to revised autocorrelation descriptors, our molecular graph featurization method for machine learning various spin state dependent properties of octahedral transition metal complexes (TMCs). Inspired by analytical semi-empirical models for TMCs, the new modeling strategy is based on the many-body expansion (MBE) and allows one to tune the captured stereoisomer information by changing the truncation order of the MBE. We present the necessary modifications to include this approach in two commonly used machine learning methods, kernel ridge regression and feed-forward neural networks. On a test set composed of all possible isomers of binary transition metal complexes, the best MBE models achieve mean absolute errors of 2.75 kcal/mol on spin-splitting energies and 0.26 eV on frontier orbital energy gaps, a 30-40% reduction in error compared to models based on our previous approach. We also observe improved generalization to previously unseen ligands where the best-performing models exhibit mean absolute errors of 4.00 kcal/mol (i.e., a 0.73 kcal/mol reduction) on the spin-splitting energies and 0.53 eV (i.e., a 0.10 eV reduction) on the frontier orbital energy gaps. Because the new approach incorporates insights from electronic structure theory, such as ligand additivity relationships, these models exhibit systematic generalization from homoleptic to heteroleptic complexes, allowing for efficient screening of TMC search spaces.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes