LGAIQMFeb 13, 2022

Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport

arXiv:2202.06208v315 citations
AI Analysis

This work addresses generalization challenges in molecular representation learning for chemistry and materials science, which could accelerate the discovery of new substances, but it appears incremental as it builds on existing optimal transport methods.

The authors tackled the problem of limited or heterogeneous training data in molecular regression by developing MROT, an optimal transport-based algorithm that enhances generalization, and it significantly outperformed state-of-the-art models in experiments on chemical property prediction and materials adsorption selection.

Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes