Beyond Learning on Molecules by Weakly Supervising on Molecules
This addresses the need for more efficient and interpretable molecular encoders in computational chemistry, though it is incremental in its approach.
The paper tackles the problem of task-dependent molecular representations by using weak supervision on programmatically derived motifs, achieving state-of-the-art performance across molecular property prediction benchmarks.
Molecular representations are inherently task-dependent, yet most pre-trained molecular encoders are not. Task conditioning promises representations that reorganize based on task descriptions, but existing approaches rely on expensive labeled data. We show that weak supervision on programmatically derived molecular motifs is sufficient. Our Adaptive Chemical Embedding Model (ACE-Mol) learns from hundreds of motifs paired with natural language descriptors that are cheap to compute, trivial to scale. Conventional encoders slowly search the embedding space for task-relevant structure, whereas ACE-Mol immediately aligns its representations with the task. ACE-Mol achieves state-of-the-art performance across molecular property prediction benchmarks with interpretable, chemically meaningful representations.