Energy-Based Coarse-Graining in Molecular Dynamics: A Flow-Based Framework without Data
This provides a principled, data-free alternative to traditional coarse-graining methods, addressing the 'chicken-and-egg' challenge and back-mapping problem for molecular simulations.
The paper tackles the data dependence problem in coarse-grained molecular dynamics by introducing a fully data-free, generative framework that directly targets the all-atom Boltzmann distribution, demonstrating accurate mode capture and atomic reconstruction on synthetic systems and alanine dipeptide.
Coarse-grained (CG) models provide an effective route to reducing the complexity of molecular simulations (MD), but conventional approaches depend heavily on long all-atom MD trajectories to adequately sample configurational space. This data dependence limits accuracy and generalizability, as unvisited configurations remain excluded from the resulting CG models. We introduce a fully data-free, generative framework for CG that directly targets the all-atom Boltzmann distribution. The model defines a structured latent space comprising slow collective variables, associated with multimodal marginal densities capturing metastable states, and fast variables, represented through simple, unimodal conditional distributions. A learnable, bijective map from latent space to atomistic coordinates enables the automatic and accurate reconstruction of molecular structures. Training relies solely on the interatomic potential and minimizes the reverse Kullback-Leibler (KL) divergence via an energy-based objective. To stabilize optimization and ensure mode coverage, we employ an adaptive tempering scheme that promotes the exploration of diverse configurations. Once trained, the model can generate independent, one-shot equilibrium samples at full atomic resolution. Validation on two synthetic systems, a double-well potential and a Gaussian mixture model, as well as on the benchmark alanine dipeptide, demonstrates that the method captures all relevant modes of the Boltzmann distribution, reconstructs atomic configurations, and automatically learns physically meaningful CG representations. These results suggest a promising, data-free alternative to traditional CG techniques, offering both a principled approach to addressing the long-standing "chicken-and-egg" challenge in coarse-graining and an effective solution to the back-mapping problem by enabling accurate reconstruction of all-atom configurations.