Joint Embedding Predictive Architecture for self-supervised pretraining on polymer molecular graphs
This work addresses the challenge of data scarcity for researchers in polymer discovery, though it is incremental as it adapts an existing SSL method to a new domain.
The authors tackled the problem of scarce labeled data for polymer property prediction by applying the Joint Embedding Predictive Architecture (JEPA) for self-supervised pretraining on polymer molecular graphs, resulting in improved downstream performance, especially with very limited labeled data, across all tested datasets.
Recent advances in machine learning (ML) have shown promise in accelerating the discovery of polymers with desired properties by aiding in tasks such as virtual screening via property prediction. However, progress in polymer ML is hampered by the scarcity of high-quality labeled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent 'Joint Embedding Predictive Architecture' (JEPA), a type of architecture for self-supervised learning (SSL), on polymer molecular graphs to understand whether pretraining with the proposed SSL strategy improves downstream performance when labeled data is scarce. Our results indicate that JEPA-based self-supervised pretraining on polymer graphs enhances downstream performance, particularly when labeled data is very scarce, achieving improvements across all tested datasets.