A survey of probabilistic generative frameworks for molecular simulations
This work provides a taxonomy and numerical guidance for model selection in molecular tasks, but it is incremental as it benchmarks existing methods without introducing new ones.
The authors tackled the lack of benchmarks for probabilistic generative models in molecular science by evaluating flow-based and diffusion models on datasets with varying dimensionality and complexity, finding that no single framework is best overall, with specific models excelling in different scenarios, such as Neural Spline Flows for low-dimensional asymmetric data and Denoising Diffusion Probabilistic Models for low-dimensional complex data.
Generative artificial intelligence is now a widely used tool in molecular science. Despite the popularity of probabilistic generative models, numerical experiments benchmarking their performance on molecular data are lacking. In this work, we introduce and explain several classes of generative models, broadly sorted into two categories: flow-based models and diffusion models. We select three representative models: Neural Spline Flows, Conditional Flow Matching, and Denoising Diffusion Probabilistic Models, and examine their accuracy, computational cost, and generation speed across datasets with tunable dimensionality, complexity, and modal asymmetry. Our findings are varied, with no one framework being the best for all purposes. In a nutshell, (i) Neural Spline Flows do best at capturing mode asymmetry present in low-dimensional data, (ii) Conditional Flow Matching outperforms other models for high-dimensional data with low complexity, and (iii) Denoising Diffusion Probabilistic Models appears the best for low-dimensional data with high complexity. Our datasets include a Gaussian mixture model and the dihedral torsion angle distribution of the Aib\textsubscript{9} peptide, generated via a molecular dynamics simulation. We hope our taxonomy of probabilistic generative frameworks and numerical results may guide model selection for a wide range of molecular tasks.