LGQMNov 20, 2022

Heterogenous Ensemble of Models for Molecular Property Prediction

arXiv:2211.11035v1h-index: 10Has Code
Originality Synthesis-oriented
AI Analysis

This work provides an incremental improvement in molecular property prediction for computational chemistry and drug discovery.

The authors tackled molecular property prediction by combining multiple model architectures across different molecular modalities and ensembling them, achieving a test-challenge MAE of 0.0723 and winning the OGB Large-Scale Challenge 2022.

Previous works have demonstrated the importance of considering different modalities on molecules, each of which provide a varied granularity of information for downstream property prediction tasks. Our method combines variants of the recent TransformerM architecture with Transformer, GNN, and ResNet backbone architectures. Models are trained on the 2D data, 3D data, and image modalities of molecular graphs. We ensemble these models with a HuberRegressor. The models are trained on 4 different train/validation splits of the original train + valid datasets. This yields a winning solution to the 2\textsuperscript{nd} edition of the OGB Large-Scale Challenge (2022) on the PCQM4Mv2 molecular property prediction dataset. Our proposed method achieves a test-challenge MAE of $0.0723$ and a validation MAE of $0.07145$. Total inference time for our solution is less than 2 hours. We open-source our code at https://github.com/jfpuget/NVIDIA-PCQM4Mv2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes