Gabriele Steidl

LG
h-index51
53papers
961citations
Novelty48%
AI Score54

53 Papers

NANov 10, 2015
Restoration of Manifold-Valued Images by Half-Quadratic Minimization

Ronny Bergmann, Raymond H. Chan, Ralf Hielscher et al.

The paper addresses the generalization of the half-quadratic minimization method for the restoration of images having values in a complete Riemannian manifold. We recall the half-quadratic minimization method using the notation of the c-transform and adapt the algorithm to our special variational setting. We prove the convergence of the method for Hadamard spaces. Extensive numerical examples for images with values on spheres, in the rotation group SO(3) and in the manifold of positive definite matrices demonstrate the excellent performance of the algorithm. In particular, the method with SO(3)-valued data shows promising results for the restoration of images obtained from Electron Backscattered Diffraction which are of interest in material science.

NAOct 26, 2016
A Second Order Non-Smooth Variational Model for Restoring Manifold-Valued Images

Miroslav Bačák, Ronny Bergmann, Gabriele Steidl et al.

We introduce a new non-smooth variational model for the restoration of manifold-valued data which includes second order differences in the regularization term. While such models were successfully applied for real-valued images, we introduce the second order difference and the corresponding variational models for manifold data, which up to now only existed for cyclic data. The approach requires a combination of techniques from numerical analysis, convex optimization and differential geometry. First, we establish a suitable definition of absolute second order differences for signals and images with values in a manifold. Employing this definition, we introduce a variational denoising model based on first and second order differences in the manifold setup. In order to minimize the corresponding functional, we develop an algorithm using an inexact cyclic proximal point algorithm. We propose an efficient strategy for the computation of the corresponding proximal mappings in symmetric spaces utilizing the machinery of Jacobi fields. For the n-sphere and the manifold of symmetric positive definite matrices, we demonstrate the performance of our algorithm in practice. We prove the convergence of the proposed exact and inexact variant of the cyclic proximal point algorithm in Hadamard spaces. These results which are of interest on its own include, e.g., the manifold of symmetric positive definite matrices.

NAMar 23, 2016
A Parallel Douglas Rachford Algorithm for Minimizing ROF-like Functionals on Images with Values in Symmetric Hadamard Manifolds

Ronny Bergmann, Johannes Persch, Gabriele Steidl

We are interested in restoring images having values in a symmetric Hadamard manifold by minimizing a functional with a quadratic data term and a total variation like regularizing term. To solve the convex minimization problem, we extend the Douglas-Rachford algorithm and its parallel version to symmetric Hadamard manifolds. The core of the Douglas-Rachford algorithm are reflections of the functions involved in the functional to be minimized. In the Euclidean setting the reflections of convex lower semicontinuous functions are nonexpansive. As a consequence, convergence results for Krasnoselski-Mann iterations imply the convergence of the Douglas-Rachford algorithm. Unfortunately, this general results does not carry over to Hadamard manifolds, where proper convex lower semicontinuous functions can have expansive reflections. However, splitting our restoration functional in an appropriate way, we have only to deal with special functions namely, several distance-like functions and an indicator functions of a special convex sets. We prove that the reflections of certain distance-like functions on Hadamard manifolds are nonexpansive which is an interesting result on its own. Furthermore, the reflection of the involved indicator function is nonexpansive on Hadamard manifolds with constant curvature so that the Douglas-Rachford algorithm converges here. Several numerical examples demonstrate the advantageous performance of the suggested algorithm compared to other existing methods as the cyclic proximal point algorithm or half-quadratic minimization. Numerical convergence is also observed in our experiments on the Hadamard manifold of symmetric positive definite matrices with the affine invariant metric which does not have a constant curvature.

LGMar 8, 2023
Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

Paul Hagemann, Sophie Mildenberger, Lars Ruthotto et al.

Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of finite size. This paper develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. In addition to the quest for generating images at ever-higher resolutions, our primary motivation is to create a well-posed infinite-dimensional learning problem that we can discretize consistently on multiple resolution levels. We thereby intend to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process using trace class operators to ensure that the latent distribution is well-defined in the infinite-dimensional setting and derive the reverse processes for finite-dimensional approximations. Second, we illustrate that approximating the score function with an operator network is beneficial for multilevel training. After deriving the convergence of the discretization and the approximation of multilevel training, we demonstrate some practical benefits of our infinite-dimensional SBDM approach on a synthetic Gaussian mixture example, the MNIST dataset, and a dataset generated from a nonlinear 2D reaction-diffusion equation.

NAJun 13, 2018
Priors with Coupled First and Second Order Differences for Manifold-Valued Image Processing

Ronny Bergmann, Jan Henrik Fitschen, Johannes Persch et al.

Recently variational models with priors involving first and second order derivatives resp. differences were successfully applied for image restoration. There are several ways to incorporate the derivatives of first and second order into the prior, for example additive coupling or using infimal convolution (IC), as well as the more general model of total generalized variation (TGV). The later two methods give also decompositions of the restored images into image components with distinct "smoothness" properties which are useful in applications. This paper is the first attempt to generalize these models to manifold-valued images. We propose both extrinsic and intrinsic approaches. The extrinsic approach is based on embedding the manifold into an Euclidean space of higher dimension. Models following this approach can be formulated within the Euclidean space with a constraint restricting them to the manifold. Then alternating direction methods of multipliers can be employed for finding minima. However, the components within the infimal convolution or total generalized variation decomposition live in the embedding space rather than on the manifold which makes their interpretation difficult. Therefore we also investigate two intrinsic approaches. For manifolds which are Lie groups we propose three priors which exploit the group operation, an additive one, another with IC coupling and a third TGV like one. For computing the minimizers of the intrinsic models we apply gradient descent algorithms. For general Riemannian manifolds we further define a model for infimal convolution based on the recently developed second order differences. We demonstrate by numerical examples that our approaches works well for the circle, the 2-sphere, the rotation group, and the manifold of positive definite matrices with the affine invariant metric.

LGMay 24, 2022
PatchNR: Learning from Very Few Images by Patch Normalizing Flow Regularization

Fabian Altekrüger, Alexander Denker, Paul Hagemann et al.

Learning neural networks using only few available information is an important ongoing research topic with tremendous potential for applications. In this paper, we introduce a powerful regularizer for the variational modeling of inverse problems in imaging. Our regularizer, called patch normalizing flow regularizer (patchNR), involves a normalizing flow learned on small patches of very few images. In particular, the training is independent of the considered inverse problem such that the same regularizer can be applied for different forward operators acting on the same class of images. By investigating the distribution of patches versus those of the whole image class, we prove that our model is indeed a MAP approach. Numerical examples for low-dose and limited-angle computed tomography (CT) as well as superresolution of material images demonstrate that our method provides very high quality results. The training set consists of just six images for CT and one image for superresolution. Finally, we combine our patchNR with ideas from internal learning for performing superresolution of natural images directly from the low-resolution observation without knowledge of any high-resolution image.

NAApr 19, 2018
Nonlocal Myriad Filters for Cauchy Noise Removal

Friederike Laus, Fabien Pierre, Gabriele Steidl

The contribution of this paper is two-fold. First, we introduce a generalized myriad filter, which is a method to compute the joint maximum likelihood estimator of the location and the scale parameter of the Cauchy distribution. Estimating only the location parameter is known as myriad filter. We propose an efficient algorithm to compute the generalized myriad filter and prove its convergence. Special cases of this algorithm result in the classical myriad filtering, respective an algorithm for estimating only the scale parameter. Based on an asymptotic analysis, we develop a second, even faster generalized myriad filtering technique. Second, we use our new approaches within a nonlocal, fully unsupervised method to denoise images corrupted by Cauchy noise. Special attention is paid to the determination of similar patches in noisy images. Numerical examples demonstrate the excellent performance of our algorithms which have moreover the advantage to be robust with respect to the parameter choice.

CVApr 15, 2022
Lagrangian Motion Magnification with Double Sparse Optical Flow Decomposition

Philipp Flotho, Cosmas Heiss, Gabriele Steidl et al.

Microexpressions are fast and spatially small facial expressions that are difficult to detect. Therefore motion magnification techniques, which aim at amplifying and hence revealing subtle motion in videos, appear useful for handling such expressions. There are basically two main approaches, namely via Eulerian or Lagrangian techniques. While the first one magnifies motion implicitly by operating directly on image pixels, the Lagrangian approach uses optical flow (OF) techniques to extract and magnify pixel trajectories. In this paper, we propose a novel approach for local Lagrangian motion magnification of facial micro-motions. Our contribution is three-fold: first, we fine tune the recurrent all-pairs field transforms (RAFT) for OFs deep learning approach for faces by adding ground truth obtained from the variational dense inverse search (DIS) for OF algorithm applied to the CASME II video set of facial micro expressions. This enables us to produce OFs of facial videos in an efficient and sufficiently accurate way. Second, since facial micro-motions are both local in space and time, we propose to approximate the OF field by sparse components both in space and time leading to a double sparse decomposition. Third, we use this decomposition to magnify micro-motions in specific areas of the face, where we introduce a new forward warping strategy using a triangular splitting of the image grid and barycentric interpolation of the RGB vectors at the corners of the transformed triangles. We demonstrate the feasibility of our approach by various examples.

LGJan 27, 2023
Neural Wasserstein Gradient Flows for Maximum Mean Discrepancies with Riesz Kernels

Fabian Altekrüger, Johannes Hertrich, Gabriele Steidl

Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative NNs which are learned with respect to appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. Finally, we illustrate our neural MMD flows by numerical examples.

NAFeb 12, 2019
On the Robust PCA and Weiszfeld's Algorithm

Sebastian Neumayer, Max Nimmer, Simon Setzer et al.

Principal component analysis (PCA) is a powerful standard tool for reducing the dimensionality of data. Unfortunately, it is sensitive to outliers so that various robust PCA variants were proposed in the literature. This paper addresses the robust PCA by successively determining the directions of lines having minimal Euclidean distances from the data points. The corresponding energy functional is not differentiable at a finite number of directions which we call anchor directions. We derive a Weiszfeld-like algorithm for minimizing the energy functional which has several advantages over existing algorithms. Special attention is paid to the careful handling of the anchor directions, where we take the relation between local minima and one-sided derivatives of Lipschitz continuous functions on submanifolds of $\mathbb R^d$ into account. Using ideas for stabilizing the classical Weiszfeld algorithm at anchor points and the Kurdyka-Łojasiewicz property of the energy functional, we prove global convergence of the whole sequence of iterates generated by the algorithm to a critical point of the energy functional. Numerical examples demonstrate the very good performance of our algorithm.

NANov 3, 2018
Regularization of Inverse Problems via Time Discrete Geodesics in Image Spaces

Sebastian Neumayer, Johannes Persch, Gabriele Steidl

This paper addresses the solution of inverse problems in imaging given an additional reference image. We combine a modification of the discrete geodesic path model for image metamorphosis with a variational model,actually the $L^2$-$TV$ model, for image reconstruction. We prove that the space continuous model has a minimizer which depends in a stable way from the input data. Two minimization procedures which alternate over the involved sequences of deformations and images in different ways are proposed. The updates with respect to the image sequence exploit recent algorithms from convex analysis to minimize the $L^2$-$TV$ functional. For the numerical computation we apply a finite difference approach on staggered grids together with a multilevel strategy. We present proof-of-the-concept numerical results for sparse and limited angle computerized tomography as well as for superresolution demonstrating the power of the method.

NAApr 6, 2016
Transport between RGB Images Motivated by Dynamic Optimal Transport

Jan Henrik Fitschen, Friederike Laus, Gabriele Steidl

We propose two models for the interpolation between RGB images based on the dynamic optimal transport model of Benamou and Brenier [8]. While the application of dynamic optimal transport and its extensions to unbalanced transform were examined for gray-values images in various papers, this is the first attempt to generalize the idea to color images. The nontrivial task to incorporate color into the model is tackled by considering RGB images as three-dimensional arrays, where the transport in the RGB direction is performed in a periodic way. Following the approach of Papadakis et al. [35] for gray-value images we propose two discrete variational models, a constrained and a penalized one which can also handle unbalanced transport. We show that a minimizer of our discrete model exists, but it is not unique for some special initial/final images. For minimizing the resulting functionals we apply a primal-dual algorithm. One step of this algorithm requires the solution of a four-dimensional discretized Poisson equation with various boundary conditions in each dimension. For instance, for the penalized approach we have simultaneously zero, mirror and periodic boundary conditions. The solution can be computed efficiently using fast Sin-I, Cos-II and Fourier transforms. Numerical examples demonstrate the meaningfulness of our model.

NAFeb 28, 2018
Strain Analysis by a Total Generalized Variation Regularized Optical Flow Model

Frank Balle, Tilmann Beck, Dietmar Eifler et al.

In this paper we deal with the important problem of estimating the local strain tensor from a sequence of micro-structural images realized during deformation tests of engineering materials. Since the strain tensor is defined via the Jacobian of the displacement field, we propose to compute the displacement field by a variational model which takes care of properties of the Jacobian of the displacement field. In particular we are interested in areas of high strain. The data term of our variational model relies on the brightness invariance property of the image sequence. As prior we choose the second order total generalized variation of the displacement field. This prior splits the Jacobian of the displacement field into a smooth and a non-smooth part. The latter reflects the material cracks. An additional constraint is incorporated to handle physical properties of the non-smooth part for tensile tests. We prove that the resulting convex model has a minimizer and show how a primal-dual method can be applied to find a minimizer. The corresponding algorithm has the advantage that the strain tensor is directly computed within the iteration process. Our algorithm is further equipped with a coarse-to-fine strategy to cope with larger displacements. Numerical examples with simulated and experimental data demonstrate the very good performance of our algorithm. In comparison to state-of-the-art engineering software for strain analysis our method can resolve local phenomena much better.

LGMar 28, 2023
Conditional Generative Models are Provably Robust: Pointwise Guarantees for Bayesian Inverse Problems

Fabian Altekrüger, Paul Hagemann, Gabriele Steidl

Conditional generative models became a very powerful tool to sample from Bayesian inverse problem posteriors. It is well-known in classical Bayesian literature that posterior measures are quite robust with respect to perturbations of both the prior measure and the negative log-likelihood, which includes perturbations of the observations. However, to the best of our knowledge, the robustness of conditional generative models with respect to perturbations of the observations has not been investigated yet. In this paper, we prove for the first time that appropriately learned conditional generative models provide robust results for single observations.

NADec 20, 2018
Recent Advances in Denoising of Manifold-Valued Images

Ronny Bergmann, Friederike Laus, Johannes Persch et al.

Modern signal and image acquisition systems are able to capture data that is no longer real-valued, but may take values on a manifold. However, whenever measurements are taken, no matter whether manifold-valued or not, there occur tiny inaccuracies, which result in noisy data. In this chapter, we review recent advances in denoising of manifold-valued signals and images, where we restrict our attention to variational models and appropriate minimization algorithms. The algorithms are either classical as the subgradient algorithm or generalizations of the half-quadratic minimization method, the cyclic proximal point algorithm, and the Douglas-Rachford algorithm to manifolds. An important aspect when dealing with real-world data is the practical implementation. Here several groups provide software and toolboxes as the Manifold Optimization (Manopt) package and the manifold-valued image restoration toolbox (MVIRT).

NAMay 24, 2019
On the Rotational Invariant $L_1$-Norm PCA

Sebastian Neumayer, Max Nimmer, Simon Setzer et al.

Principal component analysis (PCA) is a powerful tool for dimensionality reduction. Unfortunately, it is sensitive to outliers, so that various robust PCA variants were proposed in the literature. Among them the so-called rotational invariant $L_1$-norm PCA is rather popular. In this paper, we reinterpret this robust method as conditional gradient algorithm and show moreover that it coincides with a gradient descent algorithm on Grassmannian manifolds. Based on this point of view, we prove for the first time convergence of the whole series of iterates to a critical point using the Kurdyka-Łojasiewicz property of the energy functional.

NAMar 12, 2019
Minimal Lipschitz and $\infty$-Harmonic Extensions of Vector-Valued Functions on Finite Graphs

Miroslav Bačák, Johannes Hertrich, Sebastian Neumayer et al.

This paper deals with extensions of vector-valued functions on finite graphs fulfilling distinguished minimality properties. We show that so-called lex and L-lex minimal extensions are actually the same and call them minimal Lipschitz extensions. Then we prove that the solution of the graph $p$-Laplacians converge to these extensions as $p\to \infty$. Furthermore, we examine the relation between minimal Lipschitz extensions and iterated weighted midrange filters and address their connection to $\infty$-Laplacians for scalar-valued functions. A convergence proof for an iterative algorithm proposed by Elmoataz et al.~(2014) for finding the zero of the $\infty$-Laplacian is given. Finally, we present applications in image inpainting.

NAMay 8, 2017
Unsupervised Multi Class Segmentation of 3D Images with Intensity Inhomogeneities

Jan Henrik Fitschen, Katharina Losch, Gabriele Steidl

Intensity inhomogeneities in images constitute a considerable challenge in image segmentation. In this paper we propose a novel biconvex variational model to tackle this task. We combine a total variation approach for multi class segmentation with a multiplicative model to handle the inhomogeneities. Our method assumes that the image intensity is the product of a smoothly varying part and a component which resembles important image structures such as edges. Therefore, we penalize in addition to the total variation of the label assignment matrix a quadratic difference term to cope with the smoothly varying factor. A critical point of our biconvex functional is computed by a modified proximal alternating linearized minimization method (PALM). We show that the assumptions for the convergence of the algorithm are fulfilled by our model. Various numerical examples demonstrate the very good performance of our method. Particular attention is paid to the segmentation of 3D FIB tomographical images which was indeed the motivation of our work.

MLOct 4, 2023
Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel

Paul Hagemann, Johannes Hertrich, Fabian Altekrüger et al.

We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.

NAMay 8, 2018
Morphing of Manifold-Valued Images inspired by Discrete Geodesics in Image Spaces

Sebastian Neumayer, Johannes Persch, Gabriele Steidl

This paper addresses the morphing of manifold-valued images based on the time discrete geodesic paths model of Berkels, Effland and Rumpf 2015. Although for our manifold-valued setting such an interpretation of the energy functional is not available so far, the model is interesting on its own. We prove the existence of a minimizing sequence within the set of $L^2(Ω,\mathcal{H})$ images having values in a finite dimensional Hadamard manifold $\mathcal{H}$ together with a minimizing sequence of admissible diffeomorphisms. To this end, we show that the continuous manifold-valued functions are dense in $L^2(Ω,\mathcal{H})$. We propose a space discrete model based on a finite difference approach on staggered grids, where we focus on the linearized elastic potential in the regularizing term. The numerical minimization alternates between i) the computation of a deformation sequence between given images via the parallel solution of certain registration problems for manifold-valued images, and ii) the computation of an image sequence with fixed first (template) and last (reference) frame based on a given sequence of deformations via the solution of a system of equations arising from the corresponding Euler-Lagrange equation. Numerical examples give a proof of the concept of our ideas.

94.7MLMay 7
Spherical Flows for Sampling Categorical Data

Jannis Chemseddine, Gregor Kornhardt, Gabriele Steidl

We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity equation on $\mathbb S^{d-1}$ to a scalar ODE in the cosine similarity, whose unique bounded solution determines the velocity. The marginal velocity and marginal score on $(\mathbb S^{d-1})^L$ both decompose into posterior-weighted tangent sums that differ only by per-token scalar weights. This gives access to both ODE and predictor-corrector (PC) sampling. The posterior is the only learned object, trained by a cross-entropy loss. Experiments compare the vMF path against geodesic and Euclidean alternatives. The combination of vMF and PC sampling significantly improves results on Sudoku and language modeling.

CVAug 27, 2024
T-FAKE: Synthesizing Thermal Images for Facial Landmarking

Philipp Flotho, Moritz Piening, Anna Kukleva et al.

Facial analysis is a key component in a wide range of applications such as healthcare, autonomous driving, and entertainment. Despite the availability of various facial RGB datasets, the thermal modality, which plays a crucial role in life sciences, medicine, and biometrics, has been largely overlooked. To address this gap, we introduce the T-FAKE dataset, a new large-scale synthetic thermal dataset with sparse and dense landmarks. To facilitate the creation of the dataset, we propose a novel RGB2Thermal loss function, which enables the domain-adaptive transfer of RGB faces to thermal style. By utilizing the Wasserstein distance between thermal and RGB patches and the statistical analysis of clinical temperature distributions on faces, we ensure that the generated thermal images closely resemble real samples. Using RGB2Thermal style transfer based on our RGB2Thermal loss function, we create the large-scale synthetic thermal T-FAKE dataset with landmark and segmentation annotations. Leveraging our novel T-FAKE dataset, probabilistic landmark prediction, and label adaptation networks, we demonstrate significant improvements in landmark detection methods on thermal images across different landmark conventions. Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations. Moreover, our RGB2Thermal loss leads to notable results in terms of perceptual evaluation and temperature prediction.

86.1LGMar 17
Self-Aware Markov Models for Discrete Reasoning

Gregor Kornhardt, Jannis Chemseddine, Christian Wald et al.

Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.

44.8LGMar 17
SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds

Viktor Stein, Wuchen Li, Gabriele Steidl

Transformers owe much of their empirical success in natural language processing to the self-attention blocks. Recent perspectives interpret attention blocks as interacting particle systems, whose mean-field limits correspond to gradient flows of interaction energy functionals on probability density spaces equipped with Wasserstein-$2$-type metrics. We extend this viewpoint by introducing accelerated attention blocks derived from inertial Nesterov-type dynamics on density spaces. In our proposed architecture, tokens carry both spatial (feature) and velocity variables. The time discretization and the approximation of accelerated density dynamics yield Hamiltonian momentum attention blocks, which constitute the proposed accelerated attention architectures. In particular, for linear self-attention, we show that the attention blocks approximate a Stein variational gradient flow, using a bilinear kernel, of a potential energy. In this setting, we prove that elliptically contoured probability distributions are preserved by the accelerated attention blocks. We present implementable particle-based algorithms and demonstrate that the proposed accelerated attention blocks converge faster than the classical attention blocks while preserving the number of oracle calls.

NAJan 13
Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs

Chenguang Duan, Yuling Jiao, Gabriele Steidl et al.

We propose a novel method for sampling from unnormalized Boltzmann densities based on a probability-flow ordinary differential equation (ODE) derived from linear stochastic interpolants. The key innovation of our approach is the use of a sequence of Langevin samplers to enable efficient simulation of the flow. Specifically, these Langevin samplers are employed (i) to generate samples from the interpolant distribution at intermediate times and (ii) to construct, starting from these intermediate times, a robust estimator of the velocity field governing the flow ODE. For both applications of the Langevin diffusions, we establish convergence guarantees. Extensive numerical experiments demonstrate the efficiency of the proposed method on challenging multimodal distributions across a range of dimensions, as well as its effectiveness in Bayesian inference tasks.

59.9LGMay 8
Generalized Wasserstein Flow Matching: Transport Plans, Everywhere, All at Once

Moritz Piening, Richard Duong, Gabriele Steidl

Flow matching has recently emerged as a flexible and efficient framework for generative modelling by learning deterministic transport dynamics between probability measures. In this work, we extend flow matching to the space of probability measures over probability measures, introducing a Wasserstein-on-Wasserstein (WoW) formulation. Leveraging the nested Wasserstein geometry, we show that measures over transport plans naturally induce velocity fields that realize metameasure flows. This yields a principled generalization of Wasserstein flow matching via coupled outer and inner transport plans. To address the substantial computational cost of WoW transport, we propose scalable approximations based on sliced and linear Wasserstein distances, enabling efficient training while promoting numerically stable, near-straight trajectories. Our framework unifies and extends existing approaches to point cloud and set generation, providing a practical and theoretically grounded method for generative modelling in WoW spaces.

LGMar 27, 2024
Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching

Jannis Chemseddine, Paul Hagemann, Gabriele Steidl et al.

In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback--Leibler divergence, this is in general not hold true for the Wasserstein distance. In this paper, we introduce a conditional Wasserstein distance via a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. Interestingly, the dual formulation of the conditional Wasserstein-1 flow resembles losses in the conditional Wasserstein GAN literature in a quite natural way. We derive theoretical properties of the conditional Wasserstein distance, characterize the corresponding geodesics and velocity fields as well as the flow ODEs. Subsequently, we propose to approximate the velocity fields by relaxing the conditional Wasserstein distance. Based on this, we propose an extension of OT Flow Matching for solving Bayesian inverse problems and demonstrate its numerical advantages on an inverse problem and class-conditional image generation.

CVJul 28, 2016
A Nonlocal Denoising Algorithm for Manifold-Valued Images Using Second Order Statistics

Friederike Laus, Mila Nikolova, Johannes Persch et al. · mila

Nonlocal patch-based methods, in particular the Bayes' approach of Lebrun, Buades and Morel (2013), are considered as state-of-the-art methods for denoising (color) images corrupted by white Gaussian noise of moderate variance. This paper is the first attempt to generalize this technique to manifold-valued images. Such images, for example images with phase or directional entries or with values in the manifold of symmetric positive definite matrices, are frequently encountered in real-world applications. Generalizing the normal law to manifolds is not canonical and different attempts have been considered. Here we focus on a straightforward intrinsic model and discuss the relation to other approaches for specific manifolds. We reinterpret the Bayesian approach of Lebrun et al. (2013) in terms of minimum mean squared error estimation, which motivates our definition of a corresponding estimator on the manifold. With this estimator at hand we present a nonlocal patch-based method for the restoration of manifold-valued images. Various proof of concept examples demonstrate the potential of the proposed algorithm.

MLFeb 7, 2024
Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

Viktor Stein, Sebastian Neumayer, Nicolaj Rux et al.

Commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy is regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. We use the kernel mean embedding to show that this regularization can be rewritten as the Moreau envelope of some function on the associated reproducing kernel Hilbert space. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized $f$-divergences, particularly their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. We provide proof-of-the-concept numerical examples for flows starting from empirical measures. Here, we cover $f$-divergences with infinite and finite recession constants. Lastly, we extend our results to the tight variational formulation of $f$-divergences and numerically compare the resulting flows.

CVDec 27, 2023
Learning from small data sets: Patch-based regularizers in inverse problems for image reconstruction

Moritz Piening, Fabian Altekrüger, Johannes Hertrich et al.

The solution of inverse problems is of fundamental interest in medical and astronomical imaging, geophysics as well as engineering and life sciences. Recent advances were made by using methods from machine learning, in particular deep neural networks. Most of these methods require a huge amount of (paired) data and computer capacity to train the networks, which often may not be available. Our paper addresses the issue of learning from small data sets by taking patches of very few images into account. We focus on the combination of model-based and data-driven methods by approximating just the image prior, also known as regularizer in the variational model. We review two methodically different approaches, namely optimizing the maximum log-likelihood of the patch distribution, and penalizing Wasserstein-like discrepancies of whole empirical patch distributions. From the point of view of Bayesian inverse problems, we show how we can achieve uncertainty quantification by approximating the posterior using Langevin Monte Carlo methods. We demonstrate the power of the methods in computed tomography, image super-resolution, and inpainting. Indeed, the approach provides also high-quality results in zero-shot super-resolution, where only a low-resolution image is available. The paper is accompanied by a GitHub repository containing implementations of all methods as well as data examples so that the reader can get their own insight into the performance.

LGJan 28, 2025
Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

Christian Wald, Gabriele Steidl

Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.

APJun 25, 2025
Telegrapher's Generative Model via Kac Flows

Richard Duong, Jannis Chemseddine, Peter K. Friz et al.

We break the mold in flow-based generative modeling by proposing a new model based on the damped wave equation, also known as telegrapher's equation. Similar to the diffusion equation and Brownian motion, there is a Feynman-Kac type relation between the telegrapher's equation and the stochastic Kac process in 1D. The Kac flow evolves stepwise linearly in time, so that the probability flow is Lipschitz continuous in the Wasserstein distance and, in contrast to diffusion flows, the norm of the velocity is globally bounded. Furthermore, the Kac model has the diffusion model as its asymptotic limit. We extend these considerations to a multi-dimensional stochastic process which consists of independent 1D Kac processes in each spatial component. We show that this process gives rise to an absolutely continuous curve in the Wasserstein space and compute the conditional velocity field starting in a Dirac point analytically. Using the framework of flow matching, we train a neural network that approximates the velocity field and use it for sample generation. Our numerical experiments demonstrate the scalability of our approach, and show its advantages over diffusion models.

LGFeb 11, 2025
Joint Metric Space Embedding by Unbalanced OT with Gromov-Wasserstein Marginal Penalization

Florian Beier, Moritz Piening, Robert Beinert et al.

We propose a new approach for unsupervised alignment of heterogeneous datasets, which maps data from two different domains without any known correspondences to a common metric space. Our method is based on an unbalanced optimal transport problem with Gromov-Wasserstein marginal penalization. It can be seen as a counterpart to the recently introduced joint multidimensional scaling method. We prove that there exists a minimizer of our functional and that for penalization parameters going to infinity, the corresponding sequence of minimizers converges to a minimizer of the so-called embedded Wasserstein distance. Our model can be reformulated as a quadratic, multi-marginal, unbalanced optimal transport problem, for which a bi-convex relaxation admits a numerical solver via block-coordinate descent. We provide numerical examples for joint embeddings in Euclidean as well as non-Euclidean spaces.

MLFeb 13, 2024
Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

Florian Beier, Hancheng Bi, Clément Sarrazin et al.

In this paper, we are concerned with estimating the joint probability of random variables $X$ and $Y$, given $N$ independent observation blocks $(\boldsymbol{x}^i,\boldsymbol{y}^i)$, $i=1,\ldots,N$, each of $M$ samples $(\boldsymbol{x}^i,\boldsymbol{y}^i) = \bigl((x^i_j, y^i_{σ^i(j)}) \bigr)_{j=1}^M$, where $σ^i$ denotes an unknown permutation of i.i.d. sampled pairs $(x^i_j,y_j^i)$, $j=1,\ldots,M$. This means that the internal ordering of the $M$ samples within an observation block is not known. We derive a maximum-likelihood inference functional, propose a computationally tractable approximation and analyze their properties. In particular, we prove a $Γ$-convergence result showing that we can recover the true density from empirical approximations as the number $N$ of blocks goes to infinity. Using entropic optimal transport kernels, we model a class of hypothesis spaces of density functions over which the inference functional can be minimized. This hypothesis class is particularly suited for approximate inference of transfer operators from data. We solve the resulting discrete minimization problem by a modification of the EMML algorithm to take addional transition probability constraints into account and prove the convergence of this algorithm. Proof-of-concept examples demonstrate the potential of our method.

LGAug 25, 2025
Provable Mixed-Noise Learning with Flow-Matching

Paul Hagemann, Robert Gruhlke, Bernhard Stankewitz et al.

We study Bayesian inverse problems with mixed noise, modeled as a combination of additive and multiplicative Gaussian components. While traditional inference methods often assume fixed or known noise characteristics, real-world applications, particularly in physics and chemistry, frequently involve noise with unknown and heterogeneous structure. Motivated by recent advances in flow-based generative modeling, we propose a novel inference framework based on conditional flow matching embedded within an Expectation-Maximization (EM) algorithm to jointly estimate posterior samplers and noise parameters. To enable high-dimensional inference and improve scalability, we use simulation-free ODE-based flow matching as the generative model in the E-step of the EM algorithm. We prove that, under suitable assumptions, the EM updates converge to the true noise parameters in the population limit of infinite observations. Our numerical results illustrate the effectiveness of combining EM inference with flow matching for mixed-noise Bayesian inverse problems.

LGNov 2, 2024
Optimizing Federated Learning by Entropy-Based Client Selection

Andreas Lutz, Gabriele Steidl, Karsten Müller et al.

Although deep learning has revolutionized domains such as natural language processing and computer vision, its dependence on centralized datasets raises serious privacy concerns. Federated learning addresses this issue by enabling multiple clients to collaboratively train a global deep learning model without compromising their data privacy. However, the performance of such a model degrades under label skew, where the label distribution differs between clients. To overcome this issue, a novel method called FedEntOpt is proposed. In each round, it selects clients to maximize the entropy of the aggregated label distribution, ensuring that the global model is exposed to data from all available classes. Extensive experiments on multiple benchmark datasets show that the proposed method outperforms several state-of-the-art algorithms by up to 6% in classification accuracy under standard settings regardless of the model size, while achieving gains of over 30% in scenarios with low participation rates and client dropout. In addition, FedEntOpt offers the flexibility to be combined with existing algorithms, enhancing their classification accuracy by more than 40%. Importantly, its performance remains unaffected even when differential privacy is applied.

IVOct 11, 2024
Conditional Generative Models for Contrast-Enhanced Synthesis of T1w and T1 Maps in Brain MRI

Moritz Piening, Fabian Altekrüger, Gabriele Steidl et al.

Contrast enhancement by Gadolinium-based contrast agents (GBCAs) is a vital tool for tumor diagnosis in neuroradiology. Based on brain MRI scans of glioblastoma before and after Gadolinium administration, we address enhancement prediction by neural networks with two new contributions. Firstly, we study the potential of generative models, more precisely conditional diffusion and flow matching, for uncertainty quantification in virtual enhancement. Secondly, we examine the performance of T1 scans from quantitive MRI versus T1-weighted scans. In contrast to T1-weighted scans, these scans have the advantage of a physically meaningful and thereby comparable voxel range. To compare network prediction performance of these two modalities with incompatible gray-value scales, we propose to evaluate segmentations of contrast-enhanced regions of interest using Dice and Jaccard scores. Across models, we observe better segmentations with T1 scans than with T1-weighted scans.

LGFeb 5, 2024
Mixed Noise and Posterior Estimation with Conditional DeepGEM

Paul Hagemann, Johannes Hertrich, Maren Casfor et al.

Motivated by indirect measurements and applications from nanometrology with a mixed noise model, we develop a novel algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that approximates the posterior. In the M-step, we propose to find the noise parameter updates again by an EM algorithm, which has analytical formulas. We compare the training of the conditional normalizing flow with the forward and reverse KL, and show that our model is able to incorporate information from many measurements, unlike previous approaches.

MLOct 14, 2025
Adapting Noise to Data: Generative Flows from 1D Processes

Jannis Chemseddine, Gregor Kornhardt, Richard Duong et al.

We introduce a general framework for constructing generative models using one-dimensional noising processes. Beyond diffusion processes, we outline examples that demonstrate the flexibility of our approach. Motivated by this, we propose a novel framework in which the 1D processes themselves are learnable, achieved by parameterizing the noise distribution through quantile functions that adapt to the data. Our construction integrates seamlessly with standard objectives, including Flow Matching and consistency models. Learning quantile-based noise naturally captures heavy tails and compact supports when present. Numerical experiments highlight both the flexibility and the effectiveness of our method.

OCJul 17, 2025
Unsupervised Ground Metric Learning

Janis Auffenberg, Jonas Bresch, Oleh Melnyk et al.

Data classification without access to labeled samples remains a challenging problem. It usually depends on an appropriately chosen distance between features, a topic addressed in metric learning. Recently, Huizing, Cantini and Peyré proposed to simultaneously learn optimal transport (OT) cost matrices between samples and features of the dataset. This leads to the task of finding positive eigenvectors of a certain nonlinear function that maps cost matrices to OT distances. Having this basic idea in mind, we consider both the algorithmic and the modeling part of unsupervised metric learning. First, we examine appropriate algorithms and their convergence. In particular, we propose to use the stochastic random function iteration algorithm and prove that it converges linearly for our setting, although our operators are not paracontractive as it was required for convergence so far. Second, we ask the natural question if the OT distance can be replaced by other distances. We show how Mahalanobis-like distances fit into our considerations. Further, we examine an approach via graph Laplacians. In contrast to the previous settings, we have just to deal with linear functions in the wanted matrices here, so that simple algorithms from linear algebra can be applied.

MLApr 10, 2025
Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

Nicolaj Rux, Michael Quellmalz, Gabriele Steidl

Negative distance kernels $K(x,y) := - \|x-y\|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in $x=y$, most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.

LGJan 13, 2025
Variable Bregman Majorization-Minimization Algorithm and its Application to Dirichlet Maximum Likelihood Estimation

Ségolène Martin, Jean-Christophe Pesquet, Gabriele Steidl et al.

We propose a novel Bregman descent algorithm for minimizing a convex function that is expressed as the sum of a differentiable part (defined over an open set) and a possibly nonsmooth term. The approach, referred to as the Variable Bregman Majorization-Minimization (VBMM) algorithm, extends the Bregman Proximal Gradient method by allowing the Bregman function used in the divergence to adaptively vary at each iteration, provided it satisfies a majorizing condition on the objective function. This adaptive framework enables the algorithm to approximate the objective more precisely at each iteration, thereby allowing for accelerated convergence compared to the traditional Bregman Proximal Gradient descent. We establish the convergence of the VBMM algorithm to a minimizer under mild assumptions on the family of metrics used. Furthermore, we introduce a novel application of both the Bregman Proximal Gradient method and the VBMM algorithm to the estimation of the multidimensional parameters of a Dirichlet distribution through the maximization of its log-likelihood. Numerical experiments confirm that the VBMM algorithm outperforms existing approaches in terms of convergence speed.

LGDec 10, 2024
Sampling from Boltzmann densities with physics informed low-rank formats

Paul Hagemann, Janina Schütte, David Sommer et al.

Our method proposes the efficient generation of samples from an unnormalized Boltzmann density by solving the underlying continuity equation in the low-rank tensor train (TT) format. It is based on the annealing path commonly used in MCMC literature, which is given by the linear interpolation in the space of energies. Inspired by Sequential Monte Carlo, we alternate between deterministic time steps from the TT representation of the flow field and stochastic steps, which include Langevin and resampling steps. These adjust the relative weights of the different modes of the target distribution and anneal to the correct path distribution. We showcase the efficiency of our method on multiple numerical examples.

LGJan 25, 2024
Manifold GCN: Diffusion-based Convolutional Neural Network for Manifold-valued Graphs

Martin Hanik, Gabriele Steidl, Christoph von Tycowicz

We propose two graph neural network layers for graphs with features in a Riemannian manifold. First, based on a manifold-valued graph diffusion equation, we construct a diffusion layer that can be applied to an arbitrary number of nodes and graph connectivity patterns. Second, we model a tangent multilayer perceptron by transferring ideas from the vector neuron framework to our general setting. Both layers are equivariant under node permutations and the feature manifold's isometries. These properties have led to a beneficial inductive bias in many deep-learning tasks. Numerical examples on synthetic data and an Alzheimer's classification application on triangle meshes of the right hippocampus demonstrate the usefulness of our new layers: While they apply to a much broader class of problems, they perform as well as or better than task-specific state-of-the-art networks.

LGNov 24, 2021
Generalized Normalizing Flows via Markov Chains

Paul Hagemann, Johannes Hertrich, Gabriele Steidl

Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This chapter provides a unified framework to handle these approaches via Markov chains. We consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables us to couple both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. Our framework establishes a useful mathematical tool to combine the various approaches.

LGSep 23, 2021
Stochastic Normalizing Flows for Inverse Problems: a Markov Chains Viewpoint

Paul Hagemann, Johannes Hertrich, Gabriele Steidl

To overcome topological constraints and improve the expressiveness of normalizing flow architectures, Wu, Köhler and Noé introduced stochastic normalizing flows which combine deterministic, learnable flow transformations with stochastic sampling methods. In this paper, we consider stochastic normalizing flows from a Markov chain point of view. In particular, we replace transition densities by general Markov kernels and establish proofs via Radon-Nikodym derivatives which allows to incorporate distributions without densities in a sound way. Further, we generalize the results for sampling from posterior distributions as required in inverse problems. The performance of the proposed conditional stochastic normalizing flow is demonstrated by numerical examples.

LGFeb 5, 2021
Invertible Neural Networks versus MCMC for Posterior Reconstruction in Grazing Incidence X-Ray Fluorescence

Anna Andrle, Nando Farchmin, Paul Hagemann et al.

Grazing incidence X-ray fluorescence is a non-destructive technique for analyzing the geometry and compositional parameters of nanostructures appearing e.g. in computer chips. In this paper, we propose to reconstruct the posterior parameter distribution given a noisy measurement generated by the forward model by an appropriately learned invertible neural network. This network resembles the transport map from a reference distribution to the posterior. We demonstrate by numerical comparisons that our method can compete with established Markov Chain Monte Carlo approaches, while being more efficient and flexible in applications.

OCNov 4, 2020
Convolutional Proximal Neural Networks and Plug-and-Play Algorithms

Johannes Hertrich, Sebastian Neumayer, Gabriele Steidl

In this paper, we introduce convolutional proximal neural networks (cPNNs), which are by construction averaged operators. For filters of full length, we propose a stochastic gradient descent algorithm on a submanifold of the Stiefel manifold to train cPNNs. In case of filters with limited length, we design algorithms for minimizing functionals that approximate the orthogonality constraints imposed on the operators by penalizing the least squares distance to the identity operator. Then, we investigate how scaled cPNNs with a prescribed Lipschitz constant can be used for denoising signals and images, where the achieved quality depends on the Lipschitz constant. Finally, we apply cPNN based denoisers within a Plug-and-Play (PnP) framework and provide convergence results for the corresponding PnP forward-backward splitting algorithm based on an oracle construction.

MLSep 16, 2020
PCA Reduced Gaussian Mixture Models with Applications in Superresolution

Johannes Hertrich, Dang Phoung Lan Nguyen, Jean-Fancois Aujol et al.

Despite the rapid development of computational hardware, the treatment of large and high dimensional data sets is still a challenging problem. This paper provides a twofold contribution to the topic. First, we propose a Gaussian Mixture Model in conjunction with a reduction of the dimensionality of the data in each component of the model by principal component analysis, called PCA-GMM. To learn the (low dimensional) parameters of the mixture model we propose an EM algorithm whose M-step requires the solution of constrained optimization problems. Fortunately, these constrained problems do not depend on the usually large number of samples and can be solved efficiently by an (inertial) proximal alternating linearized minimization algorithm. Second, we apply our PCA-GMM for the superresolution of 2D and 3D material images based on the approach of Sandeep and Jacob. Numerical results confirm the moderate influence of the dimensionality reduction on the overall superresolution result.

NAJul 26, 2018
Linkage between piecewise constant Mumford-Shah model and ROF model and its virtue in image segmentation

Xiaohao Cai, Raymond Chan, Carola-Bibiane Schonlieb et al.

The piecewise constant Mumford-Shah (PCMS) model and the Rudin-Osher-Fatemi (ROF) model are two important variational models in image segmentation and image restoration, respectively. In this paper, we explore a linkage between these models. We prove that for the two-phase segmentation problem a partial minimizer of the PCMS model can be obtained by thresholding the minimizer of the ROF model. A similar linkage is still valid for multiphase segmentation under specific assumptions. Thus it opens a new segmentation paradigm: image segmentation can be done via image restoration plus thresholding. This new paradigm, which circumvents the innate non-convex property of the PCMS model, therefore improves the segmentation performance in both efficiency (much faster than state-of-the-art methods based on PCMS model, particularly when the phase number is high) and effectiveness (producing segmentation results with better quality) due to the flexibility of the ROF model in tackling degraded images, such as noisy images, blurry images or images with information loss. As a by-product of the new paradigm, we derive a novel segmentation method, called thresholded-ROF (T-ROF) method, to illustrate the virtue of managing image segmentation through image restoration techniques. The convergence of the T-ROF method is proved, and elaborate experimental results and comparisons are presented.