LGBIO-PHJul 1, 2025

BoltzNCE: Learning Likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation

arXiv:2507.00846v36 citationsh-index: 15Has Code
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck of obtaining likelihoods for Boltzmann Generators in molecular modeling, enabling unbiased Boltzmann statistics at scale.

The paper tackles the challenge of efficiently sampling from Boltzmann distributions for complex physical systems like molecules by developing BoltzNCE, which trains an energy-based model to approximate likelihoods using noise contrastive estimation and score matching. The method achieves 100× faster inference on alanine dipeptide while matching exact likelihood results and shows effective transfer learning with at least a 6× speedup over standard molecular dynamics.

Efficient sampling from the Boltzmann distribution given its energy function is a key challenge for modeling complex physical systems such as molecules. Boltzmann Generators address this problem by leveraging continuous normalizing flows to transform a simple prior into a distribution that can be reweighted to match the target using sample likelihoods. Despite the elegance of this approach, obtaining these likelihoods requires computing costly Jacobians during integration, which is impractical for large molecular systems. To overcome this difficulty, we train an energy-based model (EBM) to approximate likelihoods using both noise contrastive estimation (NCE) and score matching, which we show outperforms the use of either objective in isolation. On 2d synthetic systems where failure can be easily visualized, NCE improves mode weighting relative to score matching alone. On alanine dipeptide, our method yields free energy profiles and energy distributions that closely match those obtained using exact likelihoods while achieving $100\times$ faster inference. By training on multiple dipeptide systems, we show that our approach also exhibits effective transfer learning, generalizing to new systems at inference time and achieving at least a $6\times$ speedup over standard MD. While many recent efforts in generative modeling have prioritized models with fast sampling, our work demonstrates the design of models with accelerated likelihoods, enabling the application of reweighting schemes that ensure unbiased Boltzmann statistics at scale. Our code is available at https://github.com/RishalAggarwal/BoltzNCE.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes