BMAILGMay 22

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

arXiv:2602.1324952.31 citationsh-index: 6
AI Analysis

For researchers in drug discovery and molecular machine learning, this work identifies co-folding as a promising pretraining paradigm, though the gains are incremental over existing methods.

This paper evaluates whether protein-ligand co-folding models can produce strong small-molecule representations. Using Boltz2, they show that co-folding representations match or outperform existing models on ADMET benchmarks, accelerate generative modeling, and improve sample efficiency in ligand optimization.

Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue of such supervision by exposing models to atom-level ligand-protein interactions, raising the question of whether co-folding models can yield strong small-molecule representations. We study this question using Boltz2, a modern co-folding model, by transferring its atom-level ligand representations to standalone small-molecule tasks. Through systematic probing and distillation, we show that Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. We further find that Boltz2 representations are complementary to those learned from conventional standalone molecular supervision, including 3D conformers, bioassay labels, and quantum-chemical properties. Finally, we extend representation alignment to reinforcement learning, showing that dense representation-level supervision can complement scalar rewards in molecular discovery. These results identify protein-ligand co-folding as a promising pretraining paradigm for small-molecule representation learning and position Boltz2 as a strong, off-the-shelf molecular foundation model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes