MTRL-SCINov 1, 2023Code
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air CaptureAnuroop Sriram, Sihoon Choi, Xiaohan Yu et al. · baidu, cmu
New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed $CO_2$ and/or $H_2O$. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.
LGApr 12, 2023
Maximum-likelihood Estimators in Physics-Informed Neural Networks for High-dimensional Inverse ProblemsGabriel S. Gusmão, Andrew J. Medford
Physics-informed neural networks (PINNs) have proven a suitable mathematical scaffold for solving inverse ordinary (ODE) and partial differential equations (PDE). Typical inverse PINNs are formulated as soft-constrained multi-objective optimization problems with several hyperparameters. In this work, we demonstrate that inverse PINNs can be framed in terms of maximum-likelihood estimators (MLE) to allow explicit error propagation from interpolation to the physical model space through Taylor expansion, without the need of hyperparameter tuning. We explore its application to high-dimensional coupled ODEs constrained by differential algebraic equations that are common in transient chemical and biological kinetics. Furthermore, we show that singular-value decomposition (SVD) of the ODE coupling matrices (reaction stoichiometry matrix) provides reduced uncorrelated subspaces in which PINNs solutions can be represented and over which residuals can be projected. Finally, SVD bases serve as preconditioners for the inversion of covariance matrices in this hyperparameter-free robust application of MLE to ``kinetics-informed neural networks''.
MTRL-SCIAug 5, 2025
The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air CaptureAnuroop Sriram, Logan M. Brabson, Xiaohan Yu et al. · baidu, cmu
Identifying useful sorbent materials for direct air capture (DAC) from humid air remains a challenge. We present the Open DAC 2025 (ODAC25) dataset, a significant expansion and improvement upon ODAC23 (Sriram et al., ACS Central Science, 10 (2024) 923), comprising nearly 60 million DFT single-point calculations for CO$_2$, H$_2$O, N$_2$, and O$_2$ adsorption in 15,000 MOFs. ODAC25 introduces chemical and configurational diversity through functionalized MOFs, high-energy GCMC-derived placements, and synthetically generated frameworks. ODAC25 also significantly improves upon the accuracy of DFT calculations and the treatment of flexible MOFs in ODAC23. Along with the dataset, we release new state-of-the-art machine-learned interatomic potentials trained on ODAC25 and evaluate them on adsorption energy and Henry's law coefficient predictions.
CHEM-PHOct 21, 2025
Prospects for Using Artificial Intelligence to Understand Intrinsic Kinetics of Heterogeneous Catalytic ReactionsAndrew J. Medford, Todd N. Whittaker, Bjarne Kreitz et al.
Artificial intelligence (AI) is influencing heterogeneous catalysis research by accelerating simulations and materials discovery. A key frontier is integrating AI with multiscale models and multimodal experiments to address the "many-to-one" challenge of linking intrinsic kinetics to observables. Advances in machine-learned force fields, microkinetics, and reactor modeling enable rapid exploration of chemical spaces, while operando and transient data provide unprecedented insight. Yet, inconsistent data quality and model complexity limit mechanistic discovery. Generative and agentic AI can automate model generation, quantify uncertainty, and couple theory with experiment, realizing "self-driving models" that produce interpretable, reproducible, and transferable understanding of catalytic systems.
LGSep 27, 2021
A Priori Calibration of Transient Kinetics Data via Machine LearningM. Ross Kunz, Adam Yonge, Rakesh Batchu et al.
The temporal analysis of products reactor provides a vast amount of transient kinetic information that may be used to describe a variety of chemical features including the residence time distribution, kinetic coefficients, number of active sites, and the reaction mechanism. However, as with any measurement device, the TAP reactor signal is convoluted with noise. To reduce the uncertainty of the kinetic measurement and any derived parameters or mechanisms, proper preprocessing must be performed prior to any advanced analysis. This preprocessing consists of baseline correction, i.e., a shift in the voltage response, and calibration, i.e., a scaling of the flux response based on prior experiments. The current methodology of preprocessing requires significant user discretion and reliance on previous experiments that may drift over time. Herein we use machine learning techniques combined with physical constraints to convert the raw instrument signal to chemical information. As such, the proposed methodology demonstrates clear benefits over the traditional preprocessing in the calibration of the inert and feed mixture products without need of prior calibration experiments or heuristic input from the user.
CHEM-PHFeb 4, 2021
A Universal Framework for Featurization of Atomistic SystemsXiangyun Lei, Andrew J. Medford
Molecular dynamics simulations are an invaluable tool in numerous scientific fields. However, the ubiquitous classical force fields cannot describe reactive systems, and quantum molecular dynamics are too computationally demanding to treat large systems or long timescales. Reactive force fields based on physics or machine learning can be used to bridge the gap in time and length scales, but these force fields require substantial effort to construct and are highly specific to a given chemical composition and application. A significant limitation of machine learning models is the use of element-specific features, leading to models that scale poorly with the number of elements. This work introduces the Gaussian multipole (GMP) featurization scheme that utilizes physically-relevant multipole expansions of the electron density around atoms to yield feature vectors that interpolate between element types and have a fixed dimension regardless of the number of elements present. We combine GMP with neural networks to directly compare it to the widely used Behler-Parinello symmetry functions for the MD17 dataset, revealing that it exhibits improved accuracy and computational efficiency. Further, we demonstrate that GMP-based models can achieve chemical accuracy for the QM9 dataset, and their accuracy remains reasonable even when extrapolating to new elements. Finally, we test GMP-based models for the Open Catalysis Project (OCP) dataset, revealing comparable performance to graph convolutional deep learning models. The results indicate that this featurization scheme fills a critical gap in the construction of efficient and transferable machine-learned force fields.
LGNov 30, 2020
Kinetics-Informed Neural NetworksGabriel S. Gusmão, Adhika P. Retnanto, Shashwati C. da Cunha et al.
Chemical kinetics and reaction engineering consists of the phenomenological framework for the disentanglement of reaction mechanisms, optimization of reaction performance and the rational design of chemical processes. Here, we utilize feed-forward artificial neural networks as basis functions to solve ordinary differential equations (ODEs) constrained by differential algebraic equations (DAEs) that describe microkinetic models (MKMs). We present an algebraic framework for the mathematical description and classification of reaction networks, types of elementary reaction, and chemical species. Under this framework, we demonstrate that the simultaneous training of neural nets and kinetic model parameters in a regularized multi-objective optimization setting leads to the solution of the inverse problem through the estimation of kinetic parameters from synthetic experimental data. We analyze a set of scenarios to establish the extent to which kinetic parameters can be retrieved from transient kinetic data, and assess the robustness of the methodology with respect to statistical noise. This approach to inverse kinetic ODEs can assist in the elucidation of reaction mechanisms based on transient data.
APNov 17, 2020
Data Driven Reaction Mechanism Estimation via Transient Kinetics and Machine LearningM. Ross Kunz, Adam Yonge, Zongtang Fang et al.
Understanding the set of elementary steps and kinetics in each reaction is extremely valuable to make informed decisions about creating the next generation of catalytic materials. With physical and mechanistic complexity of industrial catalysts, it is critical to obtain kinetic information through experimental methods. As such, this work details a methodology based on the combination of transient rate/concentration dependencies and machine learning to measure the number of active sites, the individual rate constants, and gain insight into the mechanism under a complex set of elementary steps. This new methodology was applied to simulated transient responses to verify its ability to obtain correct estimates of the micro-kinetic coefficients. Furthermore, experimental CO oxidation data was analyzed to reveal the Langmuir-Hinshelwood mechanism driving the reaction. As oxygen accumulated on the catalyst, a transition in the mechanism was clearly defined in the machine learning analysis due to the large amount of kinetic information available from transient reaction techniques. This methodology is proposed as a new data driven approach to characterize how materials control complex reaction mechanisms relying exclusively on experimental data.
HCAug 20, 2019
ElectroLens: Understanding Atomistic Simulations Through Spatially-resolved Visualization of High-dimensional FeaturesXiangyun Lei, Fred Hohman, Duen Horng Chau et al.
In recent years, machine learning (ML) has gained significant popularity in the field of chemical informatics and electronic structure theory. These techniques often require researchers to engineer abstract "features" that encode chemical concepts into a mathematical form compatible with the input to machine-learning models. However, there is no existing tool to connect these abstract features back to the actual chemical system, making it difficult to diagnose failures and to build intuition about the meaning of the features. We present ElectroLens, a new visualization tool for high-dimensional spatially-resolved features to tackle this problem. The tool visualizes high-dimensional data sets for atomistic and electron environment features by a series of linked 3D views and 2D plots. The tool is able to connect different derived features and their corresponding regions in 3D via interactive selection. It is built to be scalable, and integrate with existing infrastructure.