Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data
This work addresses the challenge of accurately characterizing temperature-dependent behaviors in molecular systems for researchers in computational chemistry and biophysics, representing an incremental advance by combining existing methods into a unified framework.
The authors tackled the problem of modeling equilibrium distributions of molecular systems across temperatures by introducing Latent Thermodynamic Flows (LaTF), which integrates representation learning and generative modeling to learn low-dimensional representations and generate distributions beyond training data, achieving results consistent with experimental and computational benchmarks in systems like a RNA tetraloop with data from only two temperatures.
Accurate characterization of the equilibrium distributions of complex molecular systems and their dependence on environmental factors such as temperature is essential for understanding thermodynamic properties and transition mechanisms. Projecting these distributions onto meaningful low-dimensional representations enables interpretability and downstream analysis. Recent advances in generative AI, particularly flow models such as Normalizing Flows (NFs), have shown promise in modeling such distributions, but their scope is limited without tailored representation learning. In this work, we introduce Latent Thermodynamic Flows (LaTF), an end-to-end framework that tightly integrates representation learning and generative modeling. LaTF unifies the State Predictive Information Bottleneck (SPIB) with NFs to simultaneously learn low-dimensional latent representations, referred to as Collective Variables (CVs), classify metastable states, and generate equilibrium distributions across temperatures beyond the training data. The two components of representation learning and generative modeling are optimized jointly, ensuring that the learned latent features capture the system's slow, important degrees of freedom while the generative model accurately reproduces the system's equilibrium behavior. We demonstrate LaTF's effectiveness across diverse systems, including a model potential, the Chignolin protein, and cluster of Lennard Jones particles, with thorough evaluations and benchmarking using multiple metrics and extensive simulations. Finally, we apply LaTF to a RNA tetraloop system, where despite using simulation data from only two temperatures, LaTF reconstructs the temperature-dependent structural ensemble and melting behavior, consistent with experimental and prior extensive computational results.