Paul Hagemann

LG
h-index14
16papers
348citations
Novelty51%
AI Score41

16 Papers

LGMar 8, 2023
Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

Paul Hagemann, Sophie Mildenberger, Lars Ruthotto et al.

Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of finite size. This paper develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. In addition to the quest for generating images at ever-higher resolutions, our primary motivation is to create a well-posed infinite-dimensional learning problem that we can discretize consistently on multiple resolution levels. We thereby intend to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process using trace class operators to ensure that the latent distribution is well-defined in the infinite-dimensional setting and derive the reverse processes for finite-dimensional approximations. Second, we illustrate that approximating the score function with an operator network is beneficial for multilevel training. After deriving the convergence of the discretization and the approximation of multilevel training, we demonstrate some practical benefits of our infinite-dimensional SBDM approach on a synthetic Gaussian mixture example, the MNIST dataset, and a dataset generated from a nonlinear 2D reaction-diffusion equation.

LGMay 24, 2022
PatchNR: Learning from Very Few Images by Patch Normalizing Flow Regularization

Fabian Altekrüger, Alexander Denker, Paul Hagemann et al.

Learning neural networks using only few available information is an important ongoing research topic with tremendous potential for applications. In this paper, we introduce a powerful regularizer for the variational modeling of inverse problems in imaging. Our regularizer, called patch normalizing flow regularizer (patchNR), involves a normalizing flow learned on small patches of very few images. In particular, the training is independent of the considered inverse problem such that the same regularizer can be applied for different forward operators acting on the same class of images. By investigating the distribution of patches versus those of the whole image class, we prove that our model is indeed a MAP approach. Numerical examples for low-dose and limited-angle computed tomography (CT) as well as superresolution of material images demonstrate that our method provides very high quality results. The training set consists of just six images for CT and one image for superresolution. Finally, we combine our patchNR with ideas from internal learning for performing superresolution of natural images directly from the low-resolution observation without knowledge of any high-resolution image.

LGMar 28, 2023
Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse Problems

Fabian Altekrüger, Paul Hagemann, Gabriele Steidl

Conditional generative models became a very powerful tool to sample from Bayesian inverse problem posteriors. It is well-known in classical Bayesian literature that posterior measures are quite robust with respect to perturbations of both the prior measure and the negative log-likelihood, which includes perturbations of the observations. However, to the best of our knowledge, the robustness of conditional generative models with respect to perturbations of the observations has not been investigated yet. In this paper, we prove for the first time that appropriately learned conditional generative models provide robust results for single observations.

MLOct 4, 2023
Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel

Paul Hagemann, Johannes Hertrich, Fabian Altekrüger et al.

We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.

LGOct 20, 2023
Y-Diagonal Couplings: Approximating Posteriors with Conditional Wasserstein Distances

Jannis Chemseddine, Paul Hagemann, Christian Wald

In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback Leibler divergence, it does not hold true for the Wasserstein distance. We will introduce a conditional Wasserstein distance with a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. By deriving its dual, we find a rigorous way to motivate the loss of conditional Wasserstein GANs. We outline conditions under which the vanilla and the conditional Wasserstein distance coincide. Furthermore, we will show numerical examples where training with the conditional Wasserstein distance yields favorable properties for posterior sampling.

MTRL-SCIDec 10, 2025
Transport Novelty Distance: A Distributional Metric for Evaluating Material Generative Models

Paul Hagemann, Simon Müller, Janine George et al.

Recent advances in generative machine learning have opened new possibilities for the discovery and design of novel materials. However, as these models become more sophisticated, the need for rigorous and meaningful evaluation metrics has grown. Existing evaluation approaches often fail to capture both the quality and novelty of generated structures, limiting our ability to assess true generative performance. In this paper, we introduce the Transport Novelty Distance (TNovD) to judge generative models used for materials discovery jointly by the quality and novelty of the generated materials. Based on ideas from Optimal Transport theory, TNovD uses a coupling between the features of the training and generated sets, which is refined into a quality and memorization regime by a threshold. The features are generated from crystal structures using a graph neural network that is trained to distinguish between materials, their augmented counterparts, and differently sized supercells using contrastive learning. We evaluate our proposed metric on typical toy experiments relevant for crystal structure prediction, including memorization, noise injection and lattice deformations. Additionally, we validate the TNovD on the MP20 validation set and the WBM substitution dataset, demonstrating that it is capable of detecting both memorized and low-quality material data. We also benchmark the performance of several popular material generative models. While introduced for materials, our TNovD framework is domain-agnostic and can be adapted for other areas, such as images and molecules.

LGMar 27, 2024
Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching

Jannis Chemseddine, Paul Hagemann, Gabriele Steidl et al.

In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback--Leibler divergence, this is in general not hold true for the Wasserstein distance. In this paper, we introduce a conditional Wasserstein distance via a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. Interestingly, the dual formulation of the conditional Wasserstein-1 flow resembles losses in the conditional Wasserstein GAN literature in a quite natural way. We derive theoretical properties of the conditional Wasserstein distance, characterize the corresponding geodesics and velocity fields as well as the flow ODEs. Subsequently, we propose to approximate the velocity fields by relaxing the conditional Wasserstein distance. Based on this, we propose an extension of OT Flow Matching for solving Bayesian inverse problems and demonstrate its numerical advantages on an inverse problem and class-conditional image generation.

CVDec 27, 2023
Learning from small data sets: Patch-based regularizers in inverse problems for image reconstruction

Moritz Piening, Fabian Altekrüger, Johannes Hertrich et al.

The solution of inverse problems is of fundamental interest in medical and astronomical imaging, geophysics as well as engineering and life sciences. Recent advances were made by using methods from machine learning, in particular deep neural networks. Most of these methods require a huge amount of (paired) data and computer capacity to train the networks, which often may not be available. Our paper addresses the issue of learning from small data sets by taking patches of very few images into account. We focus on the combination of model-based and data-driven methods by approximating just the image prior, also known as regularizer in the variational model. We review two methodically different approaches, namely optimizing the maximum log-likelihood of the patch distribution, and penalizing Wasserstein-like discrepancies of whole empirical patch distributions. From the point of view of Bayesian inverse problems, we show how we can achieve uncertainty quantification by approximating the posterior using Langevin Monte Carlo methods. We demonstrate the power of the methods in computed tomography, image super-resolution, and inpainting. Indeed, the approach provides also high-quality results in zero-shot super-resolution, where only a low-resolution image is available. The paper is accompanied by a GitHub repository containing implementations of all methods as well as data examples so that the reader can get their own insight into the performance.

LGAug 25, 2025
Provable Mixed-Noise Learning with Flow-Matching

Paul Hagemann, Robert Gruhlke, Bernhard Stankewitz et al.

We study Bayesian inverse problems with mixed noise, modeled as a combination of additive and multiplicative Gaussian components. While traditional inference methods often assume fixed or known noise characteristics, real-world applications, particularly in physics and chemistry, frequently involve noise with unknown and heterogeneous structure. Motivated by recent advances in flow-based generative modeling, we propose a novel inference framework based on conditional flow matching embedded within an Expectation-Maximization (EM) algorithm to jointly estimate posterior samplers and noise parameters. To enable high-dimensional inference and improve scalability, we use simulation-free ODE-based flow matching as the generative model in the E-step of the EM algorithm. We prove that, under suitable assumptions, the EM updates converge to the true noise parameters in the population limit of infinite observations. Our numerical results illustrate the effectiveness of combining EM inference with flow matching for mixed-noise Bayesian inverse problems.

LGFeb 5, 2024
Mixed Noise and Posterior Estimation with Conditional DeepGEM

Paul Hagemann, Johannes Hertrich, Maren Casfor et al.

Motivated by indirect measurements and applications from nanometrology with a mixed noise model, we develop a novel algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that approximates the posterior. In the M-step, we propose to find the noise parameter updates again by an EM algorithm, which has analytical formulas. We compare the training of the conditional normalizing flow with the forward and reverse KL, and show that our model is able to incorporate information from many measurements, unlike previous approaches.

LGDec 10, 2024
Sampling from Boltzmann densities with physics informed low-rank formats

Paul Hagemann, Janina Schütte, David Sommer et al.

Our method proposes the efficient generation of samples from an unnormalized Boltzmann density by solving the underlying continuity equation in the low-rank tensor train (TT) format. It is based on the annealing path commonly used in MCMC literature, which is given by the linear interpolation in the space of energies. Inspired by Sequential Monte Carlo, we alternate between deterministic time steps from the TT representation of the flow field and stochastic steps, which include Langevin and resampling steps. These adjust the relative weights of the different modes of the target distribution and anneal to the correct path distribution. We showcase the efficiency of our method on multiple numerical examples.

LGMay 19, 2023
Generative Sliced MMD Flows with Riesz Kernels

Johannes Hertrich, Christian Wald, Fabian Altekrüger et al.

Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \|x-y\|^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.

LGNov 24, 2021
Generalized Normalizing Flows via Markov Chains

Paul Hagemann, Johannes Hertrich, Gabriele Steidl

Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This chapter provides a unified framework to handle these approaches via Markov chains. We consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables us to couple both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. Our framework establishes a useful mathematical tool to combine the various approaches.

LGSep 23, 2021
Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Paul Hagemann, Johannes Hertrich, Gabriele Steidl

To overcome topological constraints and improve the expressiveness of normalizing flow architectures, Wu, Köhler and Noé introduced stochastic normalizing flows which combine deterministic, learnable flow transformations with stochastic sampling methods. In this paper, we consider stochastic normalizing flows from a Markov chain point of view. In particular, we replace transition densities by general Markov kernels and establish proofs via Radon-Nikodym derivatives which allows to incorporate distributions without densities in a sound way. Further, we generalize the results for sampling from posterior distributions as required in inverse problems. The performance of the proposed conditional stochastic normalizing flow is demonstrated by numerical examples.

LGFeb 5, 2021
Invertible Neural Networks versus MCMC for Posterior Reconstruction in Grazing Incidence X-Ray Fluorescence

Anna Andrle, Nando Farchmin, Paul Hagemann et al.

Grazing incidence X-ray fluorescence is a non-destructive technique for analyzing the geometry and compositional parameters of nanostructures appearing e.g. in computer chips. In this paper, we propose to reconstruct the posterior parameter distribution given a noisy measurement generated by the forward model by an appropriately learned invertible neural network. This network resembles the transport map from a reference distribution to the posterior. We demonstrate by numerical comparisons that our method can compete with established Markov Chain Monte Carlo approaches, while being more efficient and flexible in applications.

LGSep 7, 2020
Stabilizing Invertible Neural Networks Using Mixture Models

Paul Hagemann, Sebastian Neumayer

In this paper, we analyze the properties of invertible neural networks, which provide a way of solving inverse problems. Our main focus lies on investigating and controlling the Lipschitz constants of the corresponding inverse networks. Without such an control, numerical simulations are prone to errors and not much is gained against traditional approaches. Fortunately, our analysis indicates that changing the latent distribution from a standard normal one to a Gaussian mixture model resolves the issue of exploding Lipschitz constants. Indeed, numerical simulations confirm that this modification leads to significantly improved sampling quality in multimodal applications.