Maximilian Stupp, P. S. Koutsourelakis
Coarse-grained (CG) models provide an effective route to reducing the complexity of molecular simulations (MD), but conventional approaches depend heavily on long all-atom MD trajectories to adequately sample configurational space. This data dependence limits accuracy and generalizability, as unvisited configurations remain excluded from the resulting CG models. We introduce a fully data-free, generative framework for CG that directly targets the all-atom Boltzmann distribution. The model defines a structured latent space comprising slow collective variables, associated with multimodal marginal densities capturing metastable states, and fast variables, represented through simple, unimodal conditional distributions. A learnable, bijective map from latent space to atomistic coordinates enables the automatic and accurate reconstruction of molecular structures. Training relies solely on the interatomic potential and minimizes the reverse Kullback-Leibler (KL) divergence via an energy-based objective. To stabilize optimization and ensure mode coverage, we employ an adaptive tempering scheme that promotes the exploration of diverse configurations. Once trained, the model can generate independent, one-shot equilibrium samples at full atomic resolution. Validation on two synthetic systems, a double-well potential and a Gaussian mixture model, as well as on the benchmark alanine dipeptide, demonstrates that the method captures all relevant modes of the Boltzmann distribution, reconstructs atomic configurations, and automatically learns physically meaningful CG representations. These results suggest a promising, data-free alternative to traditional CG techniques, offering both a principled approach to addressing the long-standing "chicken-and-egg" challenge in coarse-graining and an effective solution to the back-mapping problem by enabling accurate reconstruction of all-atom configurations.