CHEM-PH MTRL-SCI LGJul 9, 2024

MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

Jun-Hyung Park, Yeachan Kim, Mingyu Lee, Hyuntae Park, SangKeun Lee

arXiv:2408.01426v115.223 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses a bottleneck in drug and materials design by improving molecular property prediction, though it appears incremental as it builds on existing Transformer-based methods.

The paper tackles overfitting and scalability issues in chemical language representation learning for molecular property prediction by introducing MolTRES, which uses generator-discriminator training and external knowledge integration, resulting in outperforming state-of-the-art models on popular tasks.

Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences -- textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.

View on arXiv PDF

Similar