Jutta Rogal

STAT-MECH
h-index74
10papers
73citations
Novelty45%
AI Score51

10 Papers

MTRL-SCIMar 17, 2023
Geometric Deep Learning for Molecular Crystal Structure Prediction

Michael Kilgour, Jutta Rogal, Mark Tuckerman

We develop and test new machine learning strategies for accelerating molecular crystal structure ranking and crystal property prediction using tools from geometric deep learning on molecular graphs. Leveraging developments in graph-based learning and the availability of large molecular crystal datasets, we train models for density prediction and stability ranking which are accurate, fast to evaluate, and applicable to molecules of widely varying size and composition. Our density prediction model, MolXtalNet-D, achieves state of the art performance, with lower than 2% mean absolute error on a large and diverse test dataset. Our crystal ranking tool, MolXtalNet-S, correctly discriminates experimental samples from synthetically generated fakes and is further validated through analysis of the submissions to the Cambridge Structural Database Blind Tests 5 and 6. Our new tools are computationally cheap and flexible enough to be deployed within an existing crystal structure prediction pipeline both to reduce the search space and score/filter crystal candidates.

LGApr 15
MolCryst-MLIPs: A Machine-Learned Interatomic Potentials Database for Molecular Crystals

Adam Lahouari, Shen Ai, Jihye Han et al.

We present an open Molecular Crystal (MC) database of Machine-Learned Interatomic Potentials (MLIP) called MolCryst-MLIPs. The first release comprises fine-tuned MACE models for nine molecular crystal systems -- Benzamide, Benzoic acid, Coumarin, Durene, Isonicotinamide, Niacinamide, Nicotinamide, Pyrazinamide, and Resorcinol -- developed using the Automated Machine Learning Pipeline (AMLP), which streamlines the entire MLIP development workflow, from reference data generation to model training and validation, into a reproducible and user-friendly pipeline. Models are fine-tuned from the MACE-MH-1 foundation model (omol head), yielding a mean energy MAE of 0.141 kJ/mol/atom and a mean force MAE of 0.648 kJ/mol/Angstrom across all systems. Dynamical stability and structural integrity, as assessed through energy conservation, P2 orientational order parameters, and radial distribution functions, are evaluated using molecular dynamics simulations. The released models and datasets constitute a growing open database of validated MLIPs, ready for production MD simulations of molecular crystal polymorphism under different thermodynamic conditions.

STAT-MECHDec 30, 2025
Assessing generative modeling approaches for free energy estimates in condensed matter

Maximilian Schebek, Jiajun He, Emil Hoffmann et al. · cambridge

The accurate estimation of free energy differences between two states is a long-standing challenge in molecular simulations. Traditional approaches generally rely on sampling multiple intermediate states to ensure sufficient overlap in phase space and are, consequently, computationally expensive. Several generative-model-based methods have recently addressed this challenge by learning a direct bridge between distributions, bypassing the need for intermediate states. However, it remains unclear which approaches provide the best trade-off between efficiency, accuracy, and scalability. In this work, we systematically review these methods and benchmark selected approaches with a focus on condensed-matter systems. In particular, we investigate the performance of discrete and continuous normalizing flows in the context of targeted free energy perturbation as well as FEAT (Free energy Estimators with Adaptive Transport) together with the escorted Jarzynski equality, using coarse-grained monatomic ice and Lennard-Jones solids as benchmark systems. We evaluate accuracy, data efficiency, computational cost, and scalability with system size. Our results provide a quantitative framework for selecting effective free energy estimation strategies in condensed-phase systems.

LGNov 25, 2025Code
MXtalTools: A Toolkit for Machine Learning on Molecular Crystals

Michael Kilgour, Mark E. Tuckerman, Jutta Rogal

We present MXtalTools, a flexible Python package for the data-driven modelling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal datasets, (2) integrated workflows for model training and inference, (3) crystal parameterization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modelling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modelling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.

COMP-PHMar 30
Boltzmann Generators for Condensed Matter via Riemannian Flow Matching

Emil Hoffmann, Maximilian Schebek, Leon Klein et al.

Sampling equilibrium distributions is fundamental to statistical mechanics. While flow matching has emerged as scalable state-of-the-art paradigm for generative modeling, its potential for equilibrium sampling in condensed-phase systems remains largely unexplored. We address this by incorporating the periodicity inherent to these systems into continuous normalizing flows using Riemannian flow matching. The high computational cost of exact density estimation intrinsic to continuous normalizing flows is mitigated by using Hutchinson's trace estimator, utilizing a crucial bias-correction step based on cumulant expansion to render the stochastic estimates suitable for rigorous thermodynamic reweighting. Our approach is validated on monatomic ice, demonstrating the ability to train on systems of unprecedented size and obtain highly accurate free energy estimates without the need for traditional multistage estimators.

LGMay 22, 2024
Multi-Type Point Cloud Autoencoder: A Complete Equivariant Embedding for Molecule Conformation and Pose

Michael Kilgour, Mark Tuckerman, Jutta Rogal

Representations are a foundational component of any modelling protocol, including on molecules and molecular solids. For tasks that depend on knowledge of both molecular conformation and 3D orientation, such as the modelling of molecular dimers, clusters, or condensed phases, we desire a rotatable representation that is provably complete in the types and positions of atomic nuclei and roto-inversion equivariant with respect to the input point cloud. In this paper, we develop, train, and evaluate a new type of autoencoder, molecular O(3) encoding net (Mo3ENet), for multi-type point clouds, for which we propose a new reconstruction loss, capitalizing on a Gaussian mixture representation of the input and output point clouds. Mo3ENet is end-to-end equivariant, meaning the learned representation can be manipulated on O(3), a practical bonus. An appropriately trained Mo3ENet latent space comprises a universal embedding for scalar and vector molecule property prediction tasks, as well as other downstream tasks incorporating the 3D molecular pose, and we demonstrate its fitness on several such tasks.

STAT-MECHSep 29, 2025
Scalable Boltzmann Generators for equilibrium sampling of large-scale materials

Maximilian Schebek, Frank Noé, Jutta Rogal

The use of generative models to sample equilibrium distributions of many-body systems, as first demonstrated by Boltzmann Generators, has attracted substantial interest due to their ability to produce unbiased and uncorrelated samples in `one shot'. Despite their promise and impressive results across the natural sciences, scaling these models to large systems remains a major challenge. In this work, we introduce a Boltzmann Generator architecture that addresses this scalability bottleneck with a focus on applications in materials science. We leverage augmented coupling flows in combination with graph neural networks to base the generation process on local environmental information, while allowing for energy-based training and fast inference. Compared to previous architectures, our model trains significantly faster, requires far less computational resources, and achieves superior sampling efficiencies. Crucially, the architecture is transferable to larger system sizes, which allows for the efficient sampling of materials with simulation cells of unprecedented size. We demonstrate the potential of our approach by applying it to several materials systems, including Lennard-Jones crystals, ice phases of mW water, and the phase diagram of silicon, for system sizes well above one thousand atoms. The trained Boltzmann Generators produce highly accurate equilibrium ensembles for various crystal structures, as well as Helmholtz and Gibbs free energies across a range of system sizes, able to reach scales where finite-size effects become negligible.

STAT-MECHMar 31
Estimating Solvation Free Energies with Boltzmann Generators

Maximilian Schebek, Nikolas M. Froböse, Bettina G. Keller et al.

Accurate calculations of solvation free energies remain a central challenge in molecular simulations, often requiring extensive sampling and numerous alchemical intermediates to ensure sufficient overlap between phase-space distributions of a solute in the gas phase and in solution. Here, we introduce a computational framework based on normalizing flows that directly maps solvent configurations between solutes of different sizes, and compare the accuracy and efficiency to conventional free energy estimates. For a Lennard-Jones solvent, we demonstrate that this approach yields acceptable accuracy in estimating free energy differences for challenging transformations, such as solute growth or increased solute-solute separation, which typically demand multiple intermediate simulation steps along the transformation. Analysis of radial distribution functions indicates that the flow generates physically meaningful solvent rearrangements, substantially enhancing configurational overlap between states in configuration space. These results suggest flow-based models as a promising alternative to traditional free energy estimation methods.

MTRL-SCISep 25, 2025
Automated Machine Learning Pipeline for Training and Analysis Using Large Language Models

Adam Lahouari, Jutta Rogal, Mark E. Tuckerman

Machine learning interatomic potentials (MLIPs) have become powerful tools to extend molecular simulations beyond the limits of quantum methods, offering near-quantum accuracy at much lower computational cost. Yet, developing reliable MLIPs remains difficult because it requires generating high-quality datasets, preprocessing atomic structures, and carefully training and validating models. In this work, we introduce an Automated Machine Learning Pipeline (AMLP) that unifies the entire workflow from dataset creation to model validation. AMLP employs large-language-model agents to assist with electronic-structure code selection, input preparation, and output conversion, while its analysis suite (AMLP-Analysis), based on ASE supports a range of molecular simulations. The pipeline is built on the MACE architecture and validated on acridine polymorphs, where, with a straightforward fine-tuning of a foundation model, mean absolute errors of ~1.7 meV/atom in energies and ~7.0 meV/Å in forces are achieved. The fitted MLIP reproduces DFT geometries with sub-Å accuracy and demonstrates stability during molecular dynamics simulations in the microcanonical and canonical ensembles.

STAT-MECHJun 18, 2024
Efficient mapping of phase diagrams with conditional Boltzmann Generators

Maximilian Schebek, Michele Invernizzi, Frank Noé et al.

The accurate prediction of phase diagrams is of central importance for both the fundamental understanding of materials as well as for technological applications in material sciences. However, the computational prediction of the relative stability between phases based on their free energy is a daunting task, as traditional free energy estimators require a large amount of simulation data to obtain uncorrelated equilibrium samples over a grid of thermodynamic states. In this work, we develop deep generative machine learning models based on the Boltzmann Generator approach for entire phase diagrams, employing normalizing flows conditioned on the thermodynamic states, e.g., temperature and pressure, that they map to. By training a single normalizing flow to transform the equilibrium distribution sampled at only one reference thermodynamic state to a wide range of target temperatures and pressures, we can efficiently generate equilibrium samples across the entire phase diagram. Using a permutation-equivariant architecture allows us, thereby, to treat solid and liquid phases on the same footing. We demonstrate our approach by predicting the solid-liquid coexistence line for a Lennard-Jones system in excellent agreement with state-of-the-art free energy methods while significantly reducing the number of energy evaluations needed.