MTRL-SCILGNov 1, 2023

The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture

BaiduCMUMeta AI
arXiv:2311.00341v289 citationsh-index: 90Has Code
AI Analysis

This work addresses the problem of sorbent discovery for carbon dioxide removal, providing a large dataset and baseline models for researchers in materials science and climate technology, though it is incremental as it builds on existing computational and ML methods.

The authors tackled the challenge of discovering metal-organic frameworks (MOFs) for direct air capture by creating the Open DAC 2023 dataset, which includes over 38 million DFT calculations on more than 8,400 MOF materials, and they identified many promising MOFs and trained state-of-the-art ML models to approximate DFT calculations.

New methods for carbon dioxide removal are urgently needed to combat global climate change. Direct air capture (DAC) is an emerging technology to capture carbon dioxide directly from ambient air. Metal-organic frameworks (MOFs) have been widely studied as potentially customizable adsorbents for DAC. However, discovering promising MOF sorbents for DAC is challenging because of the vast chemical space to explore and the need to understand materials as functions of humidity and temperature. We explore a computational approach benefiting from recent innovations in machine learning (ML) and present a dataset named Open DAC 2023 (ODAC23) consisting of more than 38M density functional theory (DFT) calculations on more than 8,400 MOF materials containing adsorbed $CO_2$ and/or $H_2O$. ODAC23 is by far the largest dataset of MOF adsorption calculations at the DFT level of accuracy currently available. In addition to probing properties of adsorbed molecules, the dataset is a rich source of information on structural relaxation of MOFs, which will be useful in many contexts beyond specific applications for DAC. A large number of MOFs with promising properties for DAC are identified directly in ODAC23. We also trained state-of-the-art ML models on this dataset to approximate calculations at the DFT level. This open-source dataset and our initial ML models will provide an important baseline for future efforts to identify MOFs for a wide range of applications, including DAC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes