MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction
This addresses a bottleneck in drug and materials design by improving molecular property prediction, though it appears incremental as it builds on existing Transformer-based methods.
The paper tackles overfitting and scalability issues in chemical language representation learning for molecular property prediction by introducing MolTRES, which uses generator-discriminator training and external knowledge integration, resulting in outperforming state-of-the-art models on popular tasks.
Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences -- textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.