LGCHEM-PHOct 25, 2023

Transferring a molecular foundation model for polymer property predictions

arXiv:2310.16958v119 citationsh-index: 20Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity for researchers in polymer science, but it is incremental as it applies an existing transfer learning approach to a new domain.

The paper tackled the problem of data scarcity in polymer property prediction by transferring a transformer model pretrained on small molecules, achieving comparable accuracy to models trained on augmented polymer datasets.

Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and materials discovery. Self-supervised pretraining of transformer models requires large-scale datasets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incurs extra computational costs. In contrast, large-scale open-source datasets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieve comparable accuracy to those trained on augmented polymer datasets for a series of benchmark prediction tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes