LGCHEM-PHOct 25, 2022

MOFormer: Self-Supervised Transformer model for Metal-Organic Framework Property Prediction

arXiv:2210.14188v1151 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses the challenge of accelerating MOF discovery for applications like energy storage and gas separation by providing a more efficient screening method, though it is incremental as it builds on existing deep learning and self-supervised techniques.

The authors tackled the problem of efficiently predicting properties of Metal-Organic Frameworks (MOFs) by proposing MOFormer, a structure-agnostic Transformer model that uses text string inputs to avoid 3D structure optimization, and they introduced a self-supervised learning framework that improved prediction accuracy and data efficiency, with pretraining enhancing performance on downstream tasks and MOFormer outperforming structure-based methods when training data is limited.

Metal-Organic Frameworks (MOFs) are materials with a high degree of porosity that can be used for applications in energy storage, water desalination, gas storage, and gas separation. However, the chemical space of MOFs is close to an infinite size due to the large variety of possible combinations of building blocks and topology. Discovering the optimal MOFs for specific applications requires an efficient and accurate search over an enormous number of potential candidates. Previous high-throughput screening methods using computational simulations like DFT can be time-consuming. Such methods also require optimizing 3D atomic structure of MOFs, which adds one extra step when evaluating hypothetical MOFs. In this work, we propose a structure-agnostic deep learning method based on the Transformer model, named as MOFormer, for property predictions of MOFs. The MOFormer takes a text string representation of MOF (MOFid) as input, thus circumventing the need of obtaining the 3D structure of hypothetical MOF and accelerating the screening process. Furthermore, we introduce a self-supervised learning framework that pretrains the MOFormer via maximizing the cross-correlation between its structure-agnostic representations and structure-based representations of crystal graph convolutional neural network (CGCNN) on >400k publicly available MOF data. Using self-supervised learning allows the MOFormer to intrinsically learn 3D structural information though it is not included in the input. Experiments show that pretraining improved the prediction accuracy of both models on various downstream prediction tasks. Furthermore, we revealed that MOFormer can be more data-efficient on quantum-chemical property prediction than structure-based CGCNN when training data is limited. Overall, MOFormer provides a novel perspective on efficient MOF design using deep learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes