MTRL-SCIMay 30
Manifold Diffusion for Structure Generation of Transition Metal ComplexesLuca Schaufelberger, Kjell Jorner
Transition metal complexes are central to catalysis, drug design, and materials science, with relevant properties strongly sensitive to their three-dimensional geometry. However, the electronic diversity and unconventional bonding environments of transition metal complexes pose a major challenge for accurate structure generation. In this work, we introduce TMCgen, a manifold diffusion machine learning model that efficiently and accurately generates geometries of transition metal complexes. By formulating the diffusion process over the metal-ligand coordination angles, combined with torsional and rotational diffusion of the ligands, TMCgen focuses on the key geometric degrees of freedom of transition metal complexes. TMCgen shows strong performance in generating accurate coordination environments on a diverse set of experimentally derived bioinorganic and organometallic complexes while requiring only few inference steps, enabling efficient generation. Our results demonstrate the potential of manifold-based generative modeling for data-efficient geometry generation, paving the way for property-conditioned design of transition metal complexes.
LGMay 28
Constrained Flow Optimization via Sequential Fine Tuning for Molecular DesignSven Gutjahr, Riccardo De Santi, Luca Schaufelberger et al.
Adapting generative foundation models, in particular diffusion and flow models, to optimize given reward functions (e.g., binding affinity) while satisfying constraints (e.g., molecular synthesizability) is fundamental for their adoption in real-world scientific discovery applications such as molecular design or protein engineering. While recent works have introduced scalable methods for reward-guided fine-tuning of such models via reinforcement learning and control schemes, it remains an open problem how to algorithmically trade-off reward maximization and constraint satisfaction in a reliable and predictable manner. Motivated by this challenge, we first present a rigorous framework for Constrained Generative Optimization, which brings an optimization viewpoint to the introduced adaptation problem and retrieves the relevant task of constrained generation as a sub-case. Then, we introduce Constrained Flow Optimization (CFO), an algorithm that automatically and provably balances reward maximization and constraint satisfaction by reducing the original problem to sequential fine-tuning via established, scalable methods. We provide convergence guarantees for constrained generative optimization and constrained generation via CFO. Ultimately, we present an experimental evaluation of CFO on both synthetic, yet illustrative, settings, and a molecular design task. Across these evaluations, CFO achieves consistent increases in reward while ensuring high constraint satisfaction, showcasing its practical utility for constrained generative optimization.
LGMar 21
Bayesian Scattering: A Principled Baseline for Uncertainty on Image DataBernardo Fichera, Zarko Ivkovic, Kjell Jorner et al.
Uncertainty quantification for image data is dominated by complex deep learning methods, yet the field lacks an interpretable, mathematically grounded baseline. We propose Bayesian scattering to fill this gap, serving as a first-step baseline akin to the role of Bayesian linear regression for tabular data. Our method couples the wavelet scattering transform-a deep, non-learned feature extractor-with a simple probabilistic head. Because scattering features are derived from geometric principles rather than learned, they avoid overfitting the training distribution. This helps provide sensible uncertainty estimates even under significant distribution shifts. We validate this on diverse tasks, including medical imaging under institution shift, wealth mapping under country-to-country shift, and Bayesian optimization of molecular properties. Our results suggest that Bayesian scattering is a solid baseline for complex uncertainty quantification methods.
HCMay 15
GEMS -- Guided Evolutionary Molecule Design for Sustainable ChemicalsCoelina Robinson, Franziska Weissbach, Kjell Jorner et al.
Designing safe and sustainable chemicals is critical to combat chemical pollution in our environment. Machine learning (ML) methods have been developed to aid with de novo molecule design. However, data on the environmental impacts of chemical compounds are sparse, resulting in low-fidelity ML oracles and unreliable candidate proposals. Furthermore, generative ML models rely on numerical scoring functions that cannot fully capture the nuanced chemical intuition of expert scientists required for real-world molecular design. We present GEMS-an interactive visual analytics tool that enables domain experts to directly collaborate with a genetic algorithm for molecule design. Users can integrate their expert knowledge to guide the evolutionary process by modifying the scoring function and molecule population without programming knowledge or ML developer support. A usage scenario demonstrates the system's application in designing sustainable antioxidant alternatives. In an interview session with domain scientists, we collected feedback on the usefulness of GEMS.
LGJan 19
Generating Cyclic Conformers with Flow Matching in Cremer-Pople CoordinatesLuca Schaufelberger, Aline Hartgers, Kjell Jorner
Cyclic molecules are ubiquitous across applications in chemistry and biology. Their restricted conformational flexibility provides structural pre-organization that is key to their function in drug discovery and catalysis. However, reliably sampling the conformer ensembles of ring systems remains challenging. Here, we introduce PuckerFlow, a generative machine learning model that performs flow matching on the Cremer-Pople space, a low-dimensional internal coordinate system capturing the relevant degrees of freedom of rings. Our approach enables generation of valid closed rings by design and demonstrates strong performance in generating conformers that are both diverse and precise. We show that PuckerFlow outperforms other conformer generation methods on nearly all quantitative metrics and illustrate the potential of PuckerFlow for ring systems relevant to chemical applications, particularly in catalysis and drug discovery. This work enables efficient and reliable conformer generation of cyclic structures, paving the way towards modeling structure-property relationships and the property-guided generation of rings across a wide range of applications in chemistry and biology.
LGFeb 26, 2025
One Set to Rule Them All: How to Obtain General Chemical Conditions via Bayesian Optimization over Curried FunctionsStefan P. Schmid, Ella Miray Rajaonson, Cher Tian Ser et al.
General parameters are highly desirable in the natural sciences - e.g., chemical reaction conditions that enable high yields across a range of related transformations. This has a significant practical impact since those general parameters can be transferred to related tasks without the need for laborious and time-intensive re-optimization. While Bayesian optimization (BO) is widely applied to find optimal parameter sets for specific tasks, it has remained underused in experiment planning towards such general optima. In this work, we consider the real-world problem of condition optimization for chemical reactions to study how performing generality-oriented BO can accelerate the identification of general optima, and whether these optima also translate to unseen examples. This is achieved through a careful formulation of the problem as an optimization over curried functions, as well as systematic evaluations of generality-oriented strategies for optimization tasks on real-world experimental data. We find that for generality-oriented optimization, simple myopic optimization strategies that decouple parameter and task selection perform comparably to more complex ones, and that effective optimization is merely determined by an effective exploration of both parameter and task space.