BERTology of Molecular Property Prediction
This addresses a problem for researchers in computational chemistry and drug discovery by clarifying performance inconsistencies, though it is incremental as it builds on existing CLM methods.
The study tackled inconsistent performance of chemical language models (CLMs) in molecular property prediction by conducting hundreds of controlled experiments to analyze factors like dataset size and model size, providing comprehensive numerical evidence and deeper insights into underlying mechanisms.
Chemical language models (CLMs) have emerged as promising competitors to popular classical machine learning models for molecular property prediction (MPP) tasks. However, an increasing number of studies have reported inconsistent and contradictory results for the performance of CLMs across various MPP benchmark tasks. In this study, we conduct and analyze hundreds of meticulously controlled experiments to systematically investigate the effects of various factors, such as dataset size, model size, and standardization, on the pre-training and fine-tuning performance of CLMs for MPP. In the absence of well-established scaling laws for encoder-only masked language models, our aim is to provide comprehensive numerical evidence and a deeper understanding of the underlying mechanisms affecting the performance of CLMs for MPP tasks, some of which appear to be entirely overlooked in the literature.