Leon Bungert

LG
h-index8
23papers
364citations
Novelty51%
AI Score52

23 Papers

LGMay 19, 2022
Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification

Leo Schwinn, Leon Bungert, An Nguyen et al.

The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improving the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In this work, we propose the Decision Region Quantification (DRQ) algorithm to improve the robustness of any differentiable pre-trained model against both real-world and worst-case distribution shifts in the data. DRQ analyzes the robustness of local decision regions in the vicinity of a given data point to make more reliable predictions. We theoretically motivate the DRQ algorithm by showing that it effectively smooths spurious local extrema in the decision surface. Furthermore, we propose an implementation using targeted and untargeted adversarial attacks. An extensive empirical evaluation shows that DRQ increases the robustness of adversarially and non-adversarially trained models against real-world and worst-case distribution shifts on several computer vision benchmark datasets.

PRFeb 22, 2024
Ratio convergence rates for Euclidean first-passage percolation: Applications to the graph infinity Laplacian

Leon Bungert, Jeff Calder, Tim Roith

In this paper we prove the first quantitative convergence rates for the graph infinity Laplace equation for length scales at the connectivity threshold. In the graph-based semi-supervised learning community this equation is also known as Lipschitz learning. The graph infinity Laplace equation is characterized by the metric on the underlying space, and convergence rates follow from convergence rates for graph distances. At the connectivity threshold, this problem is related to Euclidean first passage percolation, which is concerned with the Euclidean distance function $d_{h}(x,y)$ on a homogeneous Poisson point process on $\mathbb{R}^d$, where admissible paths have step size at most $h>0$. Using a suitable regularization of the distance function and subadditivity we prove that ${d_{h_s}(0,se_1)}/ s \to σ$ as $s\to\infty$ almost surely where $σ\geq 1$ is a dimensional constant and $h_s\gtrsim \log(s)^\frac{1}{d}$. A convergence rate is not available due to a lack of approximate superadditivity when $h_s\to \infty$. Instead, we prove convergence rates for the ratio $\frac{d_{h}(0,se_1)}{d_{h}(0,2se_1)}\to \frac{1}{2}$ when $h$ is frozen and does not depend on $s$. Combining this with the techniques that we developed in (Bungert, Calder, Roith, IMA Journal of Numerical Analysis, 2022), we show that this notion of ratio convergence is sufficient to establish uniform convergence rates for solutions of the graph infinity Laplace equation at percolation length scales.

OCOct 9, 2023
Polarized consensus-based dynamics for optimization and sampling

Leon Bungert, Tim Roith, Philipp Wacker

In this paper we propose polarized consensus-based dynamics in order to make consensus-based optimization (CBO) and sampling (CBS) applicable for objective functions with several global minima or distributions with many modes, respectively. For this, we ``polarize'' the dynamics with a localizing kernel and the resulting model can be viewed as a bounded confidence model for opinion formation in the presence of common objective. Instead of being attracted to a common weighted mean as in the original consensus-based methods, which prevents the detection of more than one minimum or mode, in our method every particle is attracted to a weighted mean which gives more weight to nearby particles. We prove that in the mean-field regime the polarized CBS dynamics are unbiased for Gaussian targets. We also prove that in the zero temperature limit and for sufficiently well-behaved strongly convex objectives the solution of the Fokker--Planck equation converges in the Wasserstein-2 distance to a Dirac measure at the minimizer. Finally, we propose a computationally more efficient generalization which works with a predefined number of clusters and improves upon our polarized baseline method for high-dimensional optimization.

APNov 28, 2022
Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning

Leon Bungert, Kerrek Stinson

In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.

62.7APApr 20
Duality for the Adversarial Total Variation

Leon Bungert, Lucas Schmitt

Adversarial training of binary classifiers can be reformulated as regularized risk minimization involving a nonlocal total variation. Building on this perspective, we establish a characterization of the subdifferential of this total variation using duality techniques. To achieve this, we derive a dual representation of the nonlocal total variation and a related integration of parts formula, involving a nonlocal gradient and divergence. We provide such duality statements both in the space of continuous functions vanishing at infinity on proper metric spaces and for the space of essentially bounded functions on Euclidean domains. Furthermore, under some additional conditions we provide characterizations of the subdifferential in these settings.

APJul 9, 2024
Convergence rates for Poisson learning to a Poisson equation with measure data

Leon Bungert, Jeff Calder, Max Mihailescu et al.

In this paper we prove discrete to continuum convergence rates for Poisson Learning, a graph-based semi-supervised learning algorithm that is based on solving the graph Poisson equation with a source term consisting of a linear combination of Dirac deltas located at labeled points and carrying label information. The corresponding continuum equation is a Poisson equation with measure data in a Euclidean domain $Ω\subset \mathbb{R}^d$. The singular nature of these equations is challenging and requires an approach with several distinct parts: (1) We prove quantitative error estimates when convolving the measure data of a Poisson equation with (approximately) radial function supported on balls. (2) We use quantitative variational techniques to prove discrete to continuum convergence rates on random geometric graphs with bandwidth $\varepsilon>0$ for bounded source terms. (3) We show how to regularize the graph Poisson equation via mollification with the graph heat kernel, and we study fine asymptotics of the heat kernel on random geometric graphs. Combining these three pillars we obtain $L^1$ convergence rates that scale, up to logarithmic factors, like $O(\varepsilon^{\frac{1}{d+2}})$ for general data distributions, and $O(\varepsilon^{\frac{2-σ}{d+4}})$ for uniformly distributed data, where $σ>0$. These rates are valid with high probability if $\varepsilon\gg\left({\log n}/{n}\right)^q$ where $n$ denotes the number of vertices of the graph and $q \approx \frac{1}{3d}$.

96.1APMay 11
Quantifying Concentration Phenomena of Mean-Field Transformers in the Low-Temperature Regime

Albert Alcalde, Leon Bungert, Konstantin Riedl et al.

Transformers with self-attention modules as their core components have become an integral architecture in modern large language and foundation models. In this paper, we study the evolution of tokens in deep encoder-only transformers at inference time which is described in the large-token limit by a mean-field continuity equation. Leveraging ideas from the convergence analysis of interacting multi-particle systems, with particles corresponding to tokens, we prove that the token distribution rapidly concentrates onto the push-forward of the initial distribution under a projection map induced by the key, query, and value matrices, and remains metastable for moderate times. Specifically, we show that the Wasserstein distance of the two distributions scales like $\sqrt{{\log(β+1)}/β}\exp(Ct)+\exp(-ct)$ in terms of the temperature parameter $β^{-1}\to 0$ and inference time $t\geq 0$. For the proof, we establish Lyapunov-type estimates for the zero-temperature equation, identify its limit as $t\to\infty$, and employ a stability estimate in Wasserstein space together with a quantitative Laplace principle to couple the two equations. Our result implies that for time scales of order $\logβ$ the token distribution concentrates at the identified limiting distribution. Numerical experiments confirm this and, beyond that, complement our theory by showing that for finite $β$ and large $t$ the dynamics enter a different terminal phase, dominated by the spectrum of the value matrix.

LGJun 4, 2021Code
Neural Architecture Search via Bregman Iterations

Leon Bungert, Tim Roith, Daniel Tenbrinck et al.

We propose a novel strategy for Neural Architecture Search (NAS) based on Bregman iterations. Starting from a sparse neural network our gradient-based one-shot algorithm gradually adds relevant parameters in an inverse scale space manner. This allows the network to choose the best architecture in the search space which makes it well-designed for a given task, e.g., by adding neurons or skip connections. We demonstrate that using our approach one can unveil, for instance, residual autoencoders for denoising, deblurring, and classification tasks. Code is available at https://github.com/TimRoith/BregmanLearning.

APApr 22, 2024
A mean curvature flow arising in adversarial training

Leon Bungert, Tim Laux, Kerrek Stinson

We connect adversarial training for binary classification to a geometric evolution equation for the decision boundary. Relying on a perspective that recasts adversarial training as a regularization problem, we introduce a modified training scheme that constitutes a minimizing movements scheme for a nonlocal perimeter functional. We prove that the scheme is monotone and consistent as the adversarial budget vanishes and the perimeter localizes, and as a consequence we rigorously show that the scheme approximates a weighted mean curvature flow. This highlights that the efficacy of adversarial training may be due to locally minimizing the length of the decision boundary. In our analysis, we introduce a variety of tools for working with the subdifferential of a supremal-type nonlocal total variation and its regularity properties.

LGFeb 3
Sparse Training of Neural Networks based on Multilevel Mirror Descent

Yannick Lunk, Sebastian J. Scott, Leon Bungert

We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key idea is to combine sparsity-inducing Bregman iterations with adaptive freezing of the network structure to enable efficient exploration of the sparse parameter space while maintaining sparsity. We provide convergence guaranties by embedding our method in a multilevel optimization framework. Furthermore, we empirically show that our algorithm can produce highly sparse and accurate models on standard benchmarks. We also show that the theoretical number of FLOPs compared to SGD training can be reduced from 38% for standard Bregman iterations to 6% for our method while maintaining test accuracy.

OCJan 21, 2025
MirrorCBO: A consensus-based optimization method in the spirit of mirror descent

Leon Bungert, Franca Hoffmann, Dohyeon Kim et al.

In this work we propose MirrorCBO, a consensus-based optimization (CBO) method which generalizes standard CBO in the same way that mirror descent generalizes gradient descent. For this we apply the CBO methodology to a swarm of dual particles and retain the primal particle positions by applying the inverse of the mirror map, which we parametrize as the subdifferential of a strongly convex function $φ$. In this way, we combine the advantages of a derivative-free non-convex optimization algorithm with those of mirror descent. As a special case, the method extends CBO to optimization problems with convex constraints. Assuming bounds on the Bregman distance associated to $φ$, we provide asymptotic convergence results for MirrorCBO with explicit exponential rate. Another key contribution is an exploratory numerical study of this new algorithm across different application settings, focusing on (i) sparsity-inducing optimization, and (ii) constrained optimization, demonstrating the competitive performance of MirrorCBO. We observe empirically that the method can also be used for optimization on (non-convex) submanifolds of Euclidean space, can be adapted to mirrored versions of other recent CBO variants, and that it inherits from mirror descent the capability to select desirable minimizers, like sparse ones. We also include an overview of recent CBO approaches for constrained optimization and compare their performance to MirrorCBO.

OCJun 30, 2025
Consensus-based optimization for closed-box adversarial attacks and a connection to evolution strategies

Tim Roith, Leon Bungert, Philipp Wacker

Consensus-based optimization (CBO) has established itself as an efficient gradient-free optimization scheme, with attractive mathematical properties, such as mean-field convergence results for non-convex loss functions. In this work, we study CBO in the context of closed-box adversarial attacks, which are imperceptible input perturbations that aim to fool a classifier, without accessing its gradient. Our contribution is to establish a connection between the so-called consensus hopping as introduced by Riedl et al. and natural evolution strategies (NES) commonly applied in the context of adversarial attacks and to rigorously relate both methods to gradient-based optimization schemes. Beyond that, we provide a comprehensive experimental study that shows that despite the conceptual similarities, CBO can outperform NES and other evolutionary strategies in certain scenarios.

NAFeb 20, 2025
Meshless Shape Optimization using Neural Networks and Partial Differential Equations on Graphs

Eloi Martinet, Leon Bungert

Shape optimization involves the minimization of a cost function defined over a set of shapes, often governed by a partial differential equation (PDE). In the absence of closed-form solutions, one relies on numerical methods to approximate the solution. The level set method -- when coupled with the finite element method -- is one of the most versatile numerical shape optimization approaches but still suffers from the limitations of most mesh-based methods. In this work, we present a fully meshless level set framework that leverages neural networks to parameterize the level set function and employs the graph Laplacian to approximate the underlying PDE. Our approach enables precise computations of geometric quantities such as surface normals and curvature, and allows tackling optimization problems within the class of convex shapes.

LGMay 30, 2023
It begins with a boundary: A geometric view on probabilistically robust learning

Leon Bungert, Nicolás García Trillos, Matt Jacobs et al.

Although deep neural networks have achieved super-human performance on many classification tasks, they often exhibit a worrying lack of robustness towards adversarially generated examples. Thus, considerable effort has been invested into reformulating standard Risk Minimization (RM) into an adversarially robust framework. Recently, attention has shifted towards approaches which interpolate between the robustness offered by adversarial training and the higher clean accuracy and faster training times of RM. In this paper, we take a fresh and geometric view on one such method -- Probabilistically Robust Learning (PRL). We propose a mathematical framework for understanding PRL, which allows us to identify geometric pathologies in its original formulation and to introduce a family of probabilistic nonlocal perimeter functionals to rectify them. We prove existence of solutions to the original and modified problems using novel relaxation methods and also study properties, as well as local limits, of the introduced perimeters. We also clarify, through a suitable $Γ$-convergence analysis, the way in which the original and modified PRL models interpolate between risk minimization and adversarial training.

LGNov 26, 2021
The Geometry of Adversarial Training in Binary Classification

Leon Bungert, Nicolás García Trillos, Ryan Murray

We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+$ (nonlocal) $\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense), and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.

NANov 24, 2021
Uniform Convergence Rates for Lipschitz Learning on Graphs

Leon Bungert, Jeff Calder, Tim Roith

Lipschitz learning is a graph-based semi-supervised learning method where one extends labels from a labeled to an unlabeled data set by solving the infinity Laplace equation on a weighted graph. In this work we prove uniform convergence rates for solutions of the graph infinity Laplace equation as the number of vertices grows to infinity. Their continuum limits are absolutely minimizing Lipschitz extensions with respect to the geodesic metric of the domain where the graph vertices are sampled from. We work under very general assumptions on the graph weights, the set of labeled vertices, and the continuum domain. Our main contribution is that we obtain quantitative convergence rates even for very sparsely connected graphs, as they typically appear in applications like semi-supervised learning. In particular, our framework allows for graph bandwidths down to the connectivity radius. For proving this we first show a quantitative convergence statement for graph distance functions to geodesic distance functions in the continuum. Using the "comparison with distance functions" principle, we can pass these convergence statements to infinity harmonic functions and absolutely minimizing Lipschitz extensions.

LGMay 10, 2021
A Bregman Learning Framework for Sparse Neural Networks

Leon Bungert, Tim Roith, Daniel Tenbrinck et al.

We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only 3.4% of the parameters of ResNet-18 we achieve 90.2% test accuracy on CIFAR-10, compared to 93.6% using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training.

LGMar 23, 2021
CLIP: Cheap Lipschitz Training of Neural Networks

Leon Bungert, René Raab, Tim Roith et al.

Despite the large success of deep neural networks (DNN) in recent years, most neural networks still lack mathematical guarantees in terms of stability. For instance, DNNs are vulnerable to small or even imperceptible input perturbations, so called adversarial examples, that can cause false predictions. This instability can have severe consequences in applications which influence the health and safety of humans, e.g., biomedical imaging or autonomous driving. While bounding the Lipschitz constant of a neural network improves stability, most methods rely on restricting the Lipschitz constants of each layer which gives a poor bound for the actual Lipschitz constant. In this paper we investigate a variational regularization method named CLIP for controlling the Lipschitz constant of a neural network, which can easily be integrated into the training procedure. We mathematically analyze the proposed model, in particular discussing the impact of the chosen regularization parameter on the output of the network. Finally, we numerically evaluate our method on both a nonlinear regression problem and the MNIST and Fashion-MNIST classification databases, and compare our results with a weight regularization approach.

LGFeb 24, 2021
Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis

Leo Schwinn, An Nguyen, René Raab et al.

The susceptibility of deep neural networks to untrustworthy predictions, including out-of-distribution (OOD) data and adversarial examples, still prevent their widespread use in safety-critical applications. Most existing methods either require a re-training of a given model to achieve robust identification of adversarial attacks or are limited to out-of-distribution sample detection only. In this work, we propose a geometric gradient analysis (GGA) to improve the identification of untrustworthy predictions without retraining of a given model. GGA analyzes the geometry of the loss landscape of neural networks based on the saliency maps of their respective input. To motivate the proposed approach, we provide theoretical connections between gradients' geometrical properties and local minima of the loss function. Furthermore, we demonstrate that the proposed method outperforms prior approaches in detecting OOD data and adversarial attacks, including state-of-the-art and adaptive attacks.

LGDec 7, 2020
Continuum Limit of Lipschitz Learning on Graphs

Tim Roith, Leon Bungert

Tackling semi-supervised learning problems with graph-based methods has become a trend in recent years since graphs can represent all kinds of data and provide a suitable framework for studying continuum limits, e.g., of differential operators. A popular strategy here is $p$-Laplacian learning, which poses a smoothness condition on the sought inference function on the set of unlabeled data. For $p<\infty$ continuum limits of this approach were studied using tools from $Γ$-convergence. For the case $p=\infty$, which is referred to as Lipschitz learning, continuum limits of the related infinity-Laplacian equation were studied using the concept of viscosity solutions. In this work, we prove continuum limits of Lipschitz learning using $Γ$-convergence. In particular, we define a sequence of functionals which approximate the largest local Lipschitz constant of a graph function and prove $Γ$-convergence in the $L^\infty$-topology to the supremum norm of the gradient as the graph becomes denser. Furthermore, we show compactness of the functionals which implies convergence of minimizers. In our analysis we allow a varying set of labeled data which converges to a general closed set in the Hausdorff distance. We apply our results to nonlinear ground states, i.e., minimizers with constrained $L^p$-norm, and, as a by-product, prove convergence of graph distance functions to geodesic distance functions.

IVApr 1, 2020
Robust Image Reconstruction with Misaligned Structural Information

Leon Bungert, Matthias J. Ehrhardt

Multi-modality (or multi-channel) imaging is becoming increasingly important and more widely available, e.g. hyperspectral imaging in remote sensing, spectral CT in material sciences as well as multi-contrast MRI and PET-MR in medicine. Research in the last decades resulted in a plethora of mathematical methods to combine data from several modalities. State-of-the-art methods, often formulated as variational regularization, have shown to significantly improve image reconstruction both quantitatively and qualitatively. Almost all of these models rely on the assumption that the modalities are perfectly registered, which is not the case in most real world applications. We propose a variational framework which jointly performs reconstruction and registration, thereby overcoming this hurdle. Our approach is the first to achieve this for different modalities and outranks established approaches in terms of accuracy of both reconstruction and registration. Numerical results on simulated and real data show the potential of the proposed strategy for various applications in multi-contrast MRI, PET-MR, and hyperspectral imaging: typical misalignments between modalities such as rotations, translations, zooms can be effectively corrected during the reconstruction process. Therefore the proposed framework allows the robust exploitation of shared information across multiple modalities under real conditions.

NAFeb 27, 2019
Computing Nonlinear Eigenfunctions via Gradient Flow Extinction

Leon Bungert, Martin Burger, Daniel Tenbrinck

In this work we investigate the computation of nonlinear eigenfunctions via the extinction profiles of gradient flows. We analyze a scheme that recursively subtracts such eigenfunctions from given data and show that this procedure yields a decomposition of the data into eigenfunctions in some cases as the 1-dimensional total variation, for instance. We discuss results of numerical experiments in which we use extinction profiles and the gradient flow for the task of spectral graph clustering as used, e.g., in machine learning applications.

CVOct 4, 2017
Blind Image Fusion for Hyperspectral Imaging with the Directional Total Variation

Leon Bungert, David A. Coomes, Matthias J. Ehrhardt et al.

Hyperspectral imaging is a cutting-edge type of remote sensing used for mapping vegetation properties, rock minerals and other materials. A major drawback of hyperspectral imaging devices is their intrinsic low spatial resolution. In this paper, we propose a method for increasing the spatial resolution of a hyperspectral image by fusing it with an image of higher spatial resolution that was obtained with a different imaging modality. This is accomplished by solving a variational problem in which the regularization functional is the directional total variation. To accommodate for possible mis-registrations between the two images, we consider a non-convex blind super-resolution problem where both a fused image and the corresponding convolution kernel are estimated. Using this approach, our model can realign the given images if needed. Our experimental results indicate that the non-convexity is negligible in practice and that reliable solutions can be computed using a variety of different optimization algorithms. Numerical results on real remote sensing data from plant sciences and urban monitoring show the potential of the proposed method and suggests that it is robust with respect to the regularization parameters, mis-registration and the shape of the kernel.