Gitta Kutyniok

LG
h-index55
100papers
3,160citations
Novelty48%
AI Score58

100 Papers

CLOct 25, 2023Code
SuperHF: Supervised Iterative Learning from Human Feedback

Gabriel Mukobi, Peter Chatain, Su Fong et al.

While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). SFT is simple and robust, powering a host of open-source models, while RLHF is a more sophisticated method used in top-tier models like ChatGPT but also suffers from instability and susceptibility to reward hacking. We propose a novel approach, Supervised Iterative Learning from Human Feedback (SuperHF), which seeks to leverage the strengths of both methods. Our hypothesis is two-fold: that the reward model used in RLHF is critical for efficient data use and model generalization and that the use of Proximal Policy Optimization (PPO) in RLHF may not be necessary and could contribute to instability issues. SuperHF replaces PPO with a simple supervised loss and a Kullback-Leibler (KL) divergence prior. It creates its own training data by repeatedly sampling a batch of model outputs and filtering them through the reward model in an online learning regime. We then break down the reward optimization problem into three components: robustly optimizing the training rewards themselves, preventing reward hacking-exploitation of the reward model that degrades model performance-as measured by a novel METEOR similarity metric, and maintaining good performance on downstream evaluations. Our experimental results show SuperHF exceeds PPO-based RLHF on the training objective, easily and favorably trades off high reward with low reward hacking, improves downstream calibration, and performs the same on our GPT-4 based qualitative evaluation scheme all the while being significantly simpler to implement, highlighting SuperHF's potential as a competitive language model alignment technique.

AIOct 9, 2023Code
ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization

Philipp Scholl, Katharina Bieker, Hillary Hauger et al.

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes. Our code and results can be found at https://github.com/Philipp238/parfam.

FAApr 18, 2010
Microlocal Analysis of the Geometric Separation Problem

David L. Donoho, Gitta Kutyniok

Image data are often composed of two or more geometrically distinct constituents; in galaxy catalogs, for instance, one sees a mixture of pointlike structures (galaxy superclusters) and curvelike structures (filaments). It would be ideal to process a single image and extract two geometrically `pure' images, each one containing features from only one of the two geometric constituents. This seems to be a seriously underdetermined problem, but recent empirical work achieved highly persuasive separations. We present a theoretical analysis showing that accurate geometric separation of point and curve singularities can be achieved by minimizing the $\ell_1$ norm of the representing coefficients in two geometrically complementary frames: wavelets and curvelets. Driving our analysis is a specific property of the ideal (but unachievable) representation where each content type is expanded in the frame best adapted to it. This ideal representation has the property that important coefficients are clustered geometrically in phase space, and that at fine scales, there is very little coherence between a cluster of elements in one frame expansion and individual elements in the complementary frame. We formally introduce notions of cluster coherence and clustered sparsity and use this machinery to show that the underdetermined systems of linear equations can be stably solved by $\ell_1$ minimization; microlocal phase space helps organize the calculations that cluster coherence requires.

NINov 18, 2022
Dataset of Pathloss and ToA Radio Maps With Localization Application

Çağkan Yapar, Ron Levie, Gitta Kutyniok et al.

In this article, we present a collection of radio map datasets in dense urban setting, which we generated and made publicly available. The datasets include simulated pathloss/received signal strength (RSS) and time of arrival (ToA) radio maps over a large collection of realistic dense urban setting in real city maps. The two main applications of the presented dataset are 1) learning methods that predict the pathloss from input city maps (namely, deep learning-based simulations), and, 2) wireless localization. The fact that the RSS and ToA maps are computed by the same simulations over the same city maps allows for a fair comparison of the RSS and ToA-based localization methods.

ITAug 28, 2012
Theory and Applications of Compressed Sensing

Gitta Kutyniok

Compressed sensing is a novel research area, which was introduced in 2006, and since then has already become a key concept in various areas of applied mathematics, computer science, and electrical engineering. It surprisingly predicts that high-dimensional signals, which allow a sparse representation by a suitable basis or, more generally, a frame, can be recovered from what was previously considered highly incomplete linear measurements by using efficient algorithms. This article shall serve as an introduction to and a survey about compressed sensing.

NAJun 10, 2011
Image Separation using Wavelets and Shearlets

Gitta Kutyniok, Wang-Q Lim

In this paper, we present an image separation method for separating images into point- and curvelike parts by employing a combined dictionary consisting of wavelets and compactly supported shearlets utilizing the fact that they sparsely represent point and curvilinear singularities, respectively. Our methodology is based on the very recently introduced mathematical theory of geometric separation, which shows that highly precise separation of the morphologically distinct features of points and curves can be achieved by $\ell^1$ minimization. Finally, we present some experimental results showing the effectiveness of our algorithm, in particular, the ability to accurately separate points from curves even if the curvature is relatively large due to the excellent localization property of compactly supported shearlets.

NAFeb 22, 2011
Data Separation by Sparse Representations

Gitta Kutyniok

Recently, sparsity has become a key concept in various areas of applied mathematics, computer science, and electrical engineering. One application of this novel methodology is the separation of data, which is composed of two (or more) morphologically distinct constituents. The key idea is to carefully select representation systems each providing sparse approximations of one of the components. Then the sparsest coefficient vector representing the data within the composed - and therefore highly redundant - representation system is computed by $\ell_1$ minimization or thresholding. This automatically enforces separation. This paper shall serve as an introduction to and a survey about this exciting area of research as well as a reference for the state-of-the-art of this research field. It will appear as a chapter in a book on "Compressed Sensing: Theory and Applications" edited by Yonina Eldar and Gitta Kutyniok.

NAOct 14, 2007
Adaptive Directional Subdivision Schemes and Shearlet Multiresolution Analysis

Gitta Kutyniok, Tomas Sauer

In this paper, we propose a solution for a fundamental problem in computational harmonic analysis, namely, the construction of a multiresolution analysis with directional components. We will do so by constructing subdivision schemes which provide a means to incorporate directionality into the data and thus the limit function. We develop a new type of non-stationary bivariate subdivision schemes, which allow to adapt the subdivision process depending on directionality constraints during its performance, and we derive a complete characterization of those masks for which these adaptive directional subdivision schemes converge. In addition, we present several numerical examples to illustrate how this scheme works. Secondly, we describe a fast decomposition associated with a sparse directional representation system for two dimensional data, where we focus on the recently introduced sparse directional representation system of shearlets. In fact, we show that the introduced adaptive directional subdivision schemes can be used as a framework for deriving a shearlet multiresolution analysis with finitely supported filters, thereby leading to a fast shearlet decomposition.

FAJun 4, 2012
Optimally sparse approximations of 3D functions by compactly supported shearlet frames

Gitta Kutyniok, Jakob Lemvig, Wang-Q Lim

We study efficient and reliable methods of capturing and sparsely representing anisotropic structures in 3D data. As a model class for multidimensional data with anisotropic features, we introduce generalized three-dimensional cartoon-like images. This function class will have two smoothness parameters: one parameter βcontrolling classical smoothness and one parameter αcontrolling anisotropic smoothness. The class then consists of piecewise C^β-smooth functions with discontinuities on a piecewise C^α-smooth surface. We introduce a pyramid-adapted, hybrid shearlet system for the three-dimensional setting and construct frames for L^2(R^3) with this particular shearlet structure. For the smoothness range 1<α=< β=< 2 we show that pyramid-adapted shearlet systems provide a nearly optimally sparse approximation rate within the generalized cartoon-like image model class measured by means of non-linear N-term approximations.

NAJun 28, 2011
Optimally Sparse Frames

Peter G. Casazza, Andreas Heinecke, Felix Krahmer et al.

Frames have established themselves as a means to derive redundant, yet stable decompositions of a signal for analysis or transmission, while also promoting sparse expansions. However, when the signal dimension is large, the computation of the frame measurements of a signal typically requires a large number of additions and multiplications, and this makes a frame decomposition intractable in applications with limited computing budget. To address this problem, in this paper, we focus on frames in finite-dimensional Hilbert spaces and introduce sparsity for such frames as a new paradigm. In our terminology, a sparse frame is a frame whose elements have a sparse representation in an orthonormal basis, thereby enabling low-complexity frame decompositions. To introduce a precise meaning of optimality, we take the sum of the numbers of vectors needed of this orthonormal basis when expanding each frame vector as sparsity measure. We then analyze the recently introduced algorithm Spectral Tetris for construction of unit norm tight frames and prove that the tight frames generated by this algorithm are in fact optimally sparse with respect to the standard unit vector basis. Finally, we show that even the generalization of Spectral Tetris for the construction of unit norm frames associated with a given frame operator produces optimally sparse frames.

FANov 28, 2012
Analysis of Inpainting via Clustered Sparsity and Microlocal Analysis

Emily J. King, Gitta Kutyniok, Xiaosheng Zhuang

Recently, compressed sensing techniques in combination with both wavelet and directional representation systems have been very effectively applied to the problem of image inpainting. However, a mathematical analysis of these techniques which reveals the underlying geometrical content is completely missing. In this paper, we provide the first comprehensive analysis in the continuum domain utilizing the novel concept of clustered sparsity, which besides leading to asymptotic error bounds also makes the superior behavior of directional representation systems over wavelets precise. First, we propose an abstract model for problems of data recovery and derive error bounds for two different recovery schemes, namely l_1 minimization and thresholding. Second, we set up a particular microlocal model for an image governed by edges inspired by seismic data as well as a particular mask to model the missing data, namely a linear singularity masked by a horizontal strip. Applying the abstract estimate in the case of wavelets and of shearlets we prove that -- provided the size of the missing part is asymptotically to the size of the analyzing functions -- asymptotically precise inpainting can be obtained for this model. Finally, we show that shearlets can fill strictly larger gaps than wavelets in this model.

LGMay 30, 2022
OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs

Yangze Zhou, Gitta Kutyniok, Bruno Ribeiro

This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) -- such as Graph Neural Networks (GNNs) -- to perform inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results.

NAAug 2, 2011
Digital Shearlet Transform

Gitta Kutyniok, Wang-Q Lim, Xiaosheng Zhuang

Over the past years, various representation systems which sparsely approximate functions governed by anisotropic features such as edges in images have been proposed. We exemplarily mention the systems of contourlets, curvelets, and shearlets. Alongside the theoretical development of these systems, algorithmic realizations of the associated transforms were provided. However, one of the most common shortcomings of these frameworks is the lack of providing a unified treatment of the continuum and digital world, i.e., allowing a digital theory to be a natural digitization of the continuum theory. In fact, shearlet systems are the only systems so far which satisfy this property, yet still deliver optimally sparse approximations of cartoon-like images. In this chapter, we provide an introduction to digital shearlet theory with a particular focus on a unified treatment of the continuum and digital realm. In our survey we will present the implementations of two shearlet transforms, one based on band-limited shearlets and the other based on compactly supported shearlets. We will moreover discuss various quantitative measures, which allow an objective comparison with other directional transforms and an objective tuning of parameters. The codes for both presented transforms as well as the framework for quantifying performance are provided in the Matlab toolbox ShearLab.

LGJul 5, 2023
Sumformer: Universal Approximation for Efficient Transformers

Silas Alberti, Niclas Dern, Laura Thesing et al.

Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequence-to-sequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer. Moreover, we derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.

FAAug 5, 2011
Shearlets and Optimally Sparse Approximations

Gitta Kutyniok, Jakob Lemvig, Wang-Q Lim

Multivariate functions are typically governed by anisotropic features such as edges in images or shock fronts in solutions of transport-dominated equations. One major goal both for the purpose of compression as well as for an efficient analysis is the provision of optimally sparse approximations of such functions. Recently, cartoon-like images were introduced in 2D and 3D as a suitable model class, and approximation properties were measured by considering the decay rate of the $L^2$ error of the best $N$-term approximation. Shearlet systems are to date the only representation system, which provide optimally sparse approximations of this model class in 2D as well as 3D. Even more, in contrast to all other directional representation systems, a theory for compactly supported shearlet frames was derived which moreover also satisfy this optimality benchmark. This chapter shall serve as an introduction to and a survey about sparse approximations of cartoon-like images by band-limited and also compactly supported shearlet frames as well as a reference for the state-of-the-art of this research field.

NASep 28, 2014
Efficient Resolution of Anisotropic Structures

Wolfgang Dahmen, Chunyan Huang, Gitta Kutyniok et al.

We highlight some recent new delevelopments concerning the sparse representation of possibly high-dimensional functions exhibiting strong anisotropic features and low regularity in isotropic Sobolev or Besov scales. Specifically, we focus on the solution of transport equations which exhibit propagation of singularities where, additionally, high-dimensionality enters when the convection field, and hence the solutions, depend on parameters varying over some compact set. Important constituents of our approach are directionally adaptive discretization concepts motivated by compactly supported shearlet systems, and well-conditioned stable variational formulations that support trial spaces with anisotropic refinements with arbitrary directionalities. We prove that they provide tight error-residual relations which are used to contrive rigorously founded adaptive refinement schemes which converge in $L_2$. Moreover, in the context of parameter dependent problems we discuss two approaches serving different purposes and working under different regularity assumptions. For frequent query problems, making essential use of the novel well-conditioned variational formulations, a new Reduced Basis Method is outlined which exhibits a certain rate-optimal performance for indefinite, unsymmetric or singularly perturbed problems. For the radiative transfer problem with scattering a sparse tensor method is presented which mitigates or even overcomes the curse of dimensionality under suitable (so far still isotropic) regularity assumptions. Numerical examples for both methods illustrate the theoretical findings.

FAApr 27, 2012
Geometric Separation by Single-Pass Alternating Thresholding

Gitta Kutyniok

Modern data is customarily of multimodal nature, and analysis tasks typically require separation into the single components. Although a highly ill-posed problem, the morphological difference of these components sometimes allow a very precise separation such as, for instance, in neurobiological imaging a separation into spines (pointlike structures) and dendrites (curvilinear structures). Recently, applied harmonic analysis introduced powerful methodologies to achieve this task, exploiting specifically designed representation systems in which the components are sparsely representable, combined with either performing $\ell_1$ minimization or thresholding on the combined dictionary. In this paper we provide a thorough theoretical study of the separation of a distributional model situation of point- and curvilinear singularities exploiting a surprisingly simple single-pass alternating thresholding method applied to the two complementary frames: wavelets and curvelets. Utilizing the fact that the coefficients are clustered geometrically, thereby exhibiting clustered/geometric sparsity in the chosen frames, we prove that at sufficiently fine scales arbitrarily precise separation is possible. Even more surprising, it turns out that the thresholding index sets converge to the wavefront sets of the point- and curvilinear singularities in phase space and that those wavefront sets are perfectly separated by the thresholding procedure. Main ingredients of our analysis are the novel notion of cluster coherence and clustered/geometric sparsity as well as a microlocal analysis viewpoint.

OCJan 15, 2023
Computability of Optimizers

Yunseok Lee, Holger Boche, Gitta Kutyniok

Optimization problems are a staple of today's scientific and technical landscape. However, at present, solvers of such problems are almost exclusively run on digital hardware. Using Turing machines as a mathematical model for any type of digital hardware, in this paper, we analyze fundamental limitations of this conceptual approach of solving optimization problems. Since in most applications, the optimizer itself is of significantly more interest than the optimal value of the corresponding function, we will focus on computability of the optimizer. In fact, we will show that in various situations the optimizer is unattainable on Turing machines and consequently on digital computers. Moreover, even worse, there does not exist a Turing machine, which approximates the optimizer itself up to a certain constant error. We prove such results for a variety of well-known problems from very different areas, including artificial intelligence, financial mathematics, and information theory, often deriving the even stronger result that such problems are not Banach-Mazur computable, also not even in an approximate sense.

SPOct 11, 2023
The First Pathloss Radio Map Prediction Challenge

Çağkan Yapar, Fabian Jaensch, Ron Levie et al.

To foster research and facilitate fair comparisons among recently proposed pathloss radio map prediction methods, we have launched the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. In this short overview paper, we briefly describe the pathloss prediction problem, the provided datasets, the challenge task and the challenge evaluation methodology. Finally, we present the results of the challenge.

FAApr 27, 2012
Clustered Sparsity and Separation of Cartoon and Texture

Gitta Kutyniok

Natural images are typically a composition of cartoon and texture structures. A medical image might, for instance, show a mixture of gray matter and the skull cap. One common task is to separate such an image into two single images, one containing the cartoon part and the other containing the texture part. Recently, a powerful class of algorithms using sparse approximation and $\ell_1$ minimization has been introduced to resolve this problem, and numerous inspiring empirical results have already been obtained. In this paper we provide the first thorough theoretical study of the separation of a combination of cartoon and texture structures in a model situation using this class of algorithms. The methodology we consider expands the image in a combined dictionary consisting of a curvelet tight frame and a Gabor tight frame and minimizes the $\ell_1$ norm on the analysis side. Sparse approximation properties then force the cartoon components into the curvelet coefficients and the texture components into the Gabor coefficients, thereby separating the image. Utilizing the fact that the coefficients are clustered geometrically, we prove that at sufficiently fine scales arbitrarily precise separation is possible. Main ingredients of our analysis are the novel notion of cluster coherence and clustered/geometric sparsity. Our analysis also provides a deep understanding on when separation is still possible.

ROOct 25, 2023
Learning-based adaption of robotic friction models

Philipp Scholl, Maged Iskandar, Sebastian Wolf et al.

In the Fourth Industrial Revolution, wherein artificial intelligence and the automation of machines occupy a central role, the deployment of robots is indispensable. However, the manufacturing process using robots, especially in collaboration with humans, is highly intricate. In particular, modeling the friction torque in robotic joints is a longstanding problem due to the lack of a good mathematical description. This motivates the usage of data-driven methods in recent works. However, model-based and data-driven models often exhibit limitations in their ability to generalize beyond the specific dynamics they were trained on, as we demonstrate in this paper. To address this challenge, we introduce a novel approach based on residual learning, which aims to adapt an existing friction model to new dynamics using as little data as possible. We validate our approach by training a base neural network on a symmetric friction data set to learn an accurate relation between the velocity and the friction torque. Subsequently, to adapt to more complex asymmetric settings, we train a second network on a small dataset, focusing on predicting the residual of the initial network's output. By combining the output of both networks in a suitable manner, our proposed estimator outperforms the conventional model-based approach, an extended LuGre model, and the base neural network significantly. Furthermore, we evaluate our method on trajectories involving external loads and still observe a substantial improvement, approximately 60-70%, over the conventional approach. Our method does not rely on data with external load during training, eliminating the need for external torque sensors. This demonstrates the generalization capability of our approach, even with a small amount of data--less than a minute--enabling adaptation to diverse scenarios based on prior knowledge about friction in different settings.

FAMay 4, 2016
Regularization and Numerical Solution of the Inverse Scattering Problem using Shearlet Frames

Gitta Kutyniok, Volker Mehrmann, Philipp Petersen

Regularization techniques for the numerical solution of inverse scattering problems in two space dimensions are discussed. Assuming that the boundary of a scatterer is its most prominent feature, we exploit as model the class of cartoon-like functions. Since functions in this class are asymptotically optimally sparsely approximated by shearlet frames, we consider shearlets as a means for regularization in a Tikhonov method. We analyze two approaches, namely solvers for the nonlinear problem and for the linearized problem obtained by the Born approximation technique. As example for the first class we study the acoustic inverse scattering problem, and for the second class, the inverse scattering problem of the Schrödinger equation. In both cases, we derive analytical results for our approaches. Whereas our emphasis for the linearized problem is more on the theoretical side due to the standardness of associated solvers, we provide numerical examples for the nonlinear problem that highlight the effectiveness of our algorithmic approach.

FAJul 16, 2014
$α$-Molecules

Philipp Grohs, Sandra Keiper, Gitta Kutyniok et al.

Within the area of applied harmonic analysis, various multiscale systems such as wavelets, ridgelets, curvelets, and shearlets have been introduced and successfully applied. The key property of each of those systems are their (optimal) approximation properties in terms of the decay of the $L^2$-error of the best $N$-term approximation for a certain class of functions. In this paper, we introduce the general framework of $α$-molecules, which encompasses most multiscale systems from applied harmonic analysis, in particular, wavelets, ridgelets, curvelets, and shearlets as well as extensions of such with $α$ being a parameter measuring the degree of anisotropy, as a means to allow a unified treatment of approximation results within this area. Based on an $α$-scaled index distance, we first prove that two systems of $α$-molecules are almost orthogonal. This leads to a general methodology to transfer approximation results within this framework, provided that certain consistency and time-frequency localization conditions of the involved systems of $α$-molecules are satisfied. We finally utilize these results to enable the derivation of optimal sparse approximation results \msch{for} a specific class of cartoon-like functions by sufficient conditions on the 'control' parameters of a system of $α$-molecules.

LGJan 26, 2023
Graph Scattering beyond Wavelet Shackles

Christian Koke, Gitta Kutyniok

This work develops a flexible and mathematically sound framework for the design and analysis of graph scattering networks with variable branching ratios and generic functional calculus filters. Spectrally-agnostic stability guarantees for node- and graph-level perturbations are derived; the vertex-set non-preserving case is treated by utilizing recently developed mathematical-physics based tools. Energy propagation through the network layers is investigated and related to truncation stability. New methods of graph-level feature aggregation are introduced and stability of the resulting composite scattering architectures is established. Finally, scattering transforms are extended to edge- and higher order tensorial input. Theoretical results are complemented by numerical investigations: Suitably chosen cattering networks conforming to the developed theory perform better than traditional graph-wavelet based scattering approaches in social network graph classification tasks and significantly outperform other graph-based learning approaches to regression of quantum-chemical energies on QM7.

CVNov 22, 2022
Explaining Image Classifiers with Multiscale Directional Image Representation

Stefan Kolek, Robert Windesheim, Hector Andrade Loarca et al.

Image classifiers are known to be difficult to interpret and therefore require explanation methods to understand their decisions. We present ShearletX, a novel mask explanation method for image classifiers based on the shearlet transform -- a multiscale directional image representation. Current mask explanation methods are regularized by smoothness constraints that protect against undesirable fine-grained explanation artifacts. However, the smoothness of a mask limits its ability to separate fine-detail patterns, that are relevant for the classifier, from nearby nuisance patterns, that do not affect the classifier. ShearletX solves this problem by avoiding smoothness regularization all together, replacing it by shearlet sparsity constraints. The resulting explanations consist of a few edges, textures, and smooth parts of the original image, that are the most relevant for the decision of the classifier. To support our method, we propose a mathematical definition for explanation artifacts and an information theoretic score to evaluate the quality of mask explanations. We demonstrate the superiority of ShearletX over previous mask based explanation methods using these new metrics, and present exemplary situations where separating fine-detail patterns allows explaining phenomena that were not explainable before.

LGOct 15, 2022
Symbolic Recovery of Differential Equations: The Identifiability Problem

Philipp Scholl, Aras Bacho, Holger Boche et al.

Symbolic recovery of differential equations is the ambitious attempt at automating the derivation of governing equations with the use of machine learning techniques. In contrast to classical methods which assume the structure of the equation to be known and focus on the estimation of specific parameters, these algorithms aim to learn the structure and the parameters simultaneously. While the uniqueness and, therefore, the identifiability of parameters of governing equations are a well-addressed problem in the field of parameter estimation, it has not been investigated for symbolic recovery. However, this problem should be even more present in this field since the algorithms aim to cover larger spaces of governing equations. In this paper, we investigate under which conditions a solution of a differential equation does not uniquely determine the equation itself. For various classes of differential equations, we provide both necessary and sufficient conditions for a function to uniquely determine the corresponding differential equation. We then use our results to devise numerical algorithms aiming to determine whether a function solves a differential equation uniquely. Finally, we provide extensive numerical experiments showing that our algorithms can indeed guarantee the uniqueness of the learned governing differential equation, without assuming any knowledge about the analytic form of function, thereby ensuring the reliability of the learned equation.

LGOct 15, 2022
Unveiling the Sampling Density in Non-Uniform Geometric Graphs

Raffaele Paolino, Aleksandar Bojchevski, Stephan Günnemann et al.

A powerful framework for studying graphs is to consider them as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. Currently, the literature mostly focuses on uniform sampling and constant neighborhood radius. However, real-world graphs are likely to be better represented by a model in which the sampling density and the neighborhood radius can both vary over the latent space. For instance, in a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. In this work, we first perform a rigorous mathematical analysis of this (more general) class of models, including derivations of the resulting graph shift operators. The key insight is that graph shift operators should be corrected in order to avoid potential distortions introduced by the non-uniform sampling. Then, we develop methods to estimate the unknown sampling density in a self-supervised fashion. Finally, we present exemplary applications in which the learnt density is used to 1) correct the graph shift operator and improve performance on a variety of tasks, 2) improve pooling, and 3) extract knowledge from networks. Our experimental findings support our theory and provide strong evidence for our model.

CVAug 3, 2023
Neural Poisson Surface Reconstruction: Resolution-Agnostic Shape Reconstruction from Point Clouds

Hector Andrade-Loarca, Julius Hege, Daniel Cremers et al.

We introduce Neural Poisson Surface Reconstruction (nPSR), an architecture for shape reconstruction that addresses the challenge of recovering 3D shapes from points. Traditional deep neural networks face challenges with common 3D shape discretization techniques due to their computational complexity at higher resolutions. To overcome this, we leverage Fourier Neural Operators to solve the Poisson equation and reconstruct a mesh from oriented point cloud measurements. nPSR exhibits two main advantages: First, it enables efficient training on low-resolution data while achieving comparable performance at high-resolution evaluation, thanks to the resolution-agnostic nature of FNOs. This feature allows for one-shot super-resolution. Second, our method surpasses existing approaches in reconstruction quality while being differentiable and robust with respect to point sampling rates. Overall, the neural Poisson surface reconstruction not only improves upon the limitations of classical deep neural networks in shape reconstruction but also achieves superior results in terms of reconstruction quality, running time, and resolution agnosticism.

AIJul 3, 2023
Reliable AI: Does the Next Generation Require Quantum Computing?

Aras Bacho, Holger Boche, Gitta Kutyniok

In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.

LGJun 11, 2022
Memorization-Dilation: Modeling Neural Collapse Under Label Noise

Duc Anh Nguyen, Ron Levie, Julian Lienen et al.

The notion of neural collapse refers to several emergent phenomena that have been empirically observed across various canonical classification problems. During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation, and the features of different classes tend to separate as much as possible. Neural collapse is often studied through a simplified model, called the unconstrained feature representation, in which the model is assumed to have "infinite expressivity" and can map each data point to any arbitrary representation. In this work, we propose a more realistic variant of the unconstrained feature representation that takes the limited expressivity of the network into account. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Using a model of the memorization-dilation (M-D) phenomenon, we show one mechanism by which different losses lead to different performances of the trained network on noisy data. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.

39.4LGApr 8
Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension

Jianfei Li, Shuo Huang, Han Feng et al.

Deep neural networks have emerged as powerful tools for learning operators defined over infinite-dimensional function spaces. However, existing theories frequently encounter difficulties related to dimensionality and limited interpretability. This work investigates how sparsity can help address these challenges in functional learning, a central ingredient in operator learning. We propose a framework that employs convolutional architectures to extract sparse features from a finite number of samples, together with deep fully connected networks to effectively approximate nonlinear functionals. Using universal discretization methods, we show that sparse approximators enable stable recovery from discrete samples. In addition, both the deterministic and the random sampling schemes are sufficient for our analysis. These findings lead to improved approximation rates and reduced sample sizes in various function spaces, including those with fast frequency decay and mixed smoothness. They also provide new theoretical insights into how sparsity can alleviate the curse of dimensionality in functional learning.

54.8ROApr 3
Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping

Liudi Yang, Yang Bai, Yuhao Wang et al.

Robotic grasping under uncertainty remains a fundamental challenge due to its uncertain and contact-rich nature. Traditional rigid robotic hands, with limited degrees of freedom and compliance, rely on complex model-based and heavy feedback controllers to manage such interactions. Soft robots, by contrast, exhibit embodied mechanical intelligence: their underactuated structures and passive flexibility of their whole body, naturally accommodate uncertain contacts and enable adaptive behaviors. To harness this capability, we propose a lightweight actuation-space learning framework that infers distributional control representations for whole-body soft robotic grasping, directly from deterministic demonstrations using a flow matching model (Rectified Flow),without requiring dense sensing or heavy control loops. Using only 30 demonstrations (less than 8% of the reachable workspace), the learned policy achieves a 97.5% grasp success rate across the whole workspace, generalizes to grasped-object size variations of +-33%, and maintains stable performance when the robot's dynamic response is directly adjusted by scaling the execution time from 20% to 200%. These results demonstrate that actuation-space learning, by leveraging its passive redundant DOFs and flexibility, converts the body's mechanics into functional control intelligence and substantially reduces the burden on central controllers for this uncertain-rich task.

LGAug 12, 2024
Computability of Classification and Deep Learning: From Theoretical Limits to Practical Feasibility through Quantization

Holger Boche, Vit Fojtik, Adalbert Fono et al.

The unwavering success of deep learning in the past decade led to the increasing prevalence of deep learning methods in various application fields. However, the downsides of deep learning, most prominently its lack of trustworthiness, may not be compatible with safety-critical or high-responsibility applications requiring stricter performance guarantees. Recently, several instances of deep learning applications have been shown to be subject to theoretical limitations of computability, undermining the feasibility of performance guarantees when employed on real-world computers. We extend the findings by studying computability in the deep learning framework from two perspectives: From an application viewpoint in the context of classification problems and a general limitation viewpoint in the context of training neural networks. In particular, we show restrictions on the algorithmic solvability of classification problems that also render the algorithmic detection of failure in computations in a general setting infeasible. Subsequently, we prove algorithmic limitations in training deep neural networks even in cases where the underlying problem is well-behaved. Finally, we end with a positive observation, showing that in quantized versions of classification and deep network training, computability restrictions do not arise or can be overcome to a certain degree.

CVDec 16, 2025
DRAW2ACT: Turning Depth-Encoded Trajectories into Robotic Demonstration Videos

Yang Bai, Liudi Yang, George Eskandar et al.

Video diffusion models provide powerful real-world simulators for embodied AI but remain limited in controllability for robotic manipulation. Recent works on trajectory-conditioned video generation address this gap but often rely on 2D trajectories or single modality conditioning, which restricts their ability to produce controllable and consistent robotic demonstrations. We present DRAW2ACT, a depth-aware trajectory-conditioned video generation framework that extracts multiple orthogonal representations from the input trajectory, capturing depth, semantics, shape and motion, and injects them into the diffusion model. Moreover, we propose to jointly generate spatially aligned RGB and depth videos, leveraging cross-modality attention mechanisms and depth supervision to enhance the spatio-temporal consistency. Finally, we introduce a multimodal policy model conditioned on the generated RGB and depth sequences to regress the robot's joint angles. Experiments on Bridge V2, Berkeley Autolab, and simulation benchmarks show that DRAW2ACT achieves superior visual fidelity and consistency while yielding higher manipulation success rates compared to existing baselines.

CVDec 10, 2025
CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing

Jianfei Li, Ines Rosellon-Inclan, Gitta Kutyniok et al.

U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks. However, challenges have emerged, as these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios. This paper introduces a novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models. Our method, termed the Conformal Hallucination Estimation Metric (CHEM), is applicable to any image reconstruction model, enabling efficient identification and quantification of hallucination artifacts. It offers two key advantages: it leverages wavelet and shearlet representations to efficiently extract hallucinations of image features and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner. Furthermore, from an approximation theoretical perspective, we explore the reasons why U-shaped networks are prone to hallucinations. We test the proposed approach on the CANDELS astronomical image dataset with models such as U-Net, SwinUNet, and Learnlets, and provide new perspectives on hallucination from different aspects in deep learning-based image processing.

LGFeb 6
Adaptive-CaRe: Adaptive Causal Regularization for Robust Outcome Prediction

Nithya Bhasker, Fiona R. Kolbinger, Susu Hu et al.

Accurate prediction of outcomes is crucial for clinical decision-making and personalized patient care. Supervised machine learning algorithms, which are commonly used for outcome prediction in the medical domain, optimize for predictive accuracy, which can result in models latching onto spurious correlations instead of robust predictors. Causal structure learning methods on the other hand have the potential to provide robust predictors for the target, but can be too conservative because of algorithmic and data assumptions, resulting in loss of diagnostic precision. Therefore, we propose a novel model-agnostic regularization strategy, Adaptive-CaRe, for generalized outcome prediction in the medical domain. Adaptive-CaRe strikes a balance between both predictive value and causal robustness by incorporating a penalty that is proportional to the difference between the estimated statistical contribution and estimated causal contribution of the input features for model predictions. Our experiments on synthetic data establish the efficacy of the proposed Adaptive-CaRe regularizer in finding robust predictors for the target while maintaining competitive predictive accuracy. With experiments on a standard causal benchmark, we provide a blueprint for navigating the trade-off between predictive accuracy and causal robustness by tweaking the regularization strength, $λ$. Validation using real-world dataset confirms that the results translate to practical, real-domain settings. Therefore, Adaptive-CaRe provides a simple yet effective solution to the long-standing trade-off between predictive accuracy and causal robustness in the medical domain. Future work would involve studying alternate causal structure learning frameworks and complex classification models to provide deeper insights at a larger scale.

DSNov 12, 2025
When is a System Discoverable from Data? Discovery Requires Chaos

Zakhar Shumaylov, Peter Zaika, Philipp Scholl et al.

The deep learning revolution has spurred a rise in advances of using AI in sciences. Within physical sciences the main focus has been on discovery of dynamical systems from observational data. Yet the reliability of learned surrogates and symbolic models is often undermined by the fundamental problem of non-uniqueness. The resulting models may fit the available data perfectly, but lack genuine predictive power. This raises the question: under what conditions can the systems governing equations be uniquely identified from a finite set of observations? We show, counter-intuitively, that chaos, typically associated with unpredictability, is crucial for ensuring a system is discoverable in the space of continuous or analytic functions. The prevalence of chaotic systems in benchmark datasets may have inadvertently obscured this fundamental limitation. More concretely, we show that systems chaotic on their entire domain are discoverable from a single trajectory within the space of continuous functions, and systems chaotic on a strange attractor are analytically discoverable under a geometric condition on the attractor. As a consequence, we demonstrate for the first time that the classical Lorenz system is analytically discoverable. Moreover, we establish that analytic discoverability is impossible in the presence of first integrals, common in real-world systems. These findings help explain the success of data-driven methods in inherently chaotic domains like weather forecasting, while revealing a significant challenge for engineering applications like digital twins, where stable, predictable behavior is desired. For these non-chaotic systems, we find that while trajectory data alone is insufficient, certain prior physical knowledge can help ensure discoverability. These findings warrant a critical re-evaluation of the fundamental assumptions underpinning purely data-driven discovery.

LGNov 2, 2025
Random Spiking Neural Networks are Stable and Spectrally Simple

Ernesto Araya, Massimiliano Datres, Gitta Kutyniok

Spiking neural networks (SNNs) are a promising paradigm for energy-efficient computation, yet their theoretical foundations-especially regarding stability and robustness-remain limited compared to artificial neural networks. In this work, we study discrete-time leaky integrate-and-fire (LIF) SNNs through the lens of Boolean function analysis. We focus on noise sensitivity and stability in classification tasks, quantifying how input perturbations affect outputs. Our main result shows that wide LIF-SNN classifiers are stable on average, a property explained by the concentration of their Fourier spectrum on low-frequency components. Motivated by this, we introduce the notion of spectral simplicity, which formalizes simplicity in terms of Fourier spectrum concentration and connects our analysis to the simplicity bias observed in deep networks. Within this framework, we show that random LIF-SNNs are biased toward simple functions. Experiments on trained networks confirm that these stability properties persist in practice. Together, these results provide new insights into the stability and robustness properties of SNNs.

CHEM-PHJun 15, 2023
On the Interplay of Subset Selection and Informed Graph Neural Networks

Niklas Breustedt, Paolo Climaco, Jochen Garcke et al.

Machine learning techniques paired with the availability of massive datasets dramatically enhance our ability to explore the chemical compound space by providing fast and accurate predictions of molecular properties. However, learning on large datasets is strongly limited by the availability of computational resources and can be infeasible in some scenarios. Moreover, the instances in the datasets may not yet be labelled and generating the labels can be costly, as in the case of quantum chemistry computations. Thus, there is a need to select small training subsets from large pools of unlabelled data points and to develop reliable ML methods that can effectively learn from small training sets. This work focuses on predicting the molecules atomization energy in the QM9 dataset. We investigate the advantages of employing domain knowledge-based data sampling methods for an efficient training set selection combined with informed ML techniques. In particular, we show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques such as kernel methods and graph neural networks. We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer based on the rate distortion explanation framework.

LGMar 16, 2022
The Mathematics of Artificial Intelligence

Gitta Kutyniok

We currently witness the spectacular success of artificial intelligence in both science and public life. However, the development of a rigorous mathematical foundation is still at an early stage. In this survey article, which is based on an invited lecture at the International Congress of Mathematicians 2022, we will in particular focus on the current "workhorse" of artificial intelligence, namely deep neural networks. We will present the main theoretical directions along with several exemplary results and discuss key open problems.

64.8CLApr 19
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

Felicia Körner, Maria Matveev, Florian Eichin et al.

Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges--particularly in the early phases of learning. To study the early trajectory of linguistic and translation capabilities, we pretrain a multilingual 1.7B model on nine diverse languages, capturing checkpoints at a much finer granularity. We further introduce a novel word-level translation dataset and trace how translation develops over training through behavioral analyses, model-component analysis, and parameter-based ablations. We find that the model quickly acquires basic linguistic capabilities in parallel with token-level copying, while translation develops in two distinct phases: an initial phase dominated by copying and surface-level similarities, and a second phase in which more generalizing translation mechanisms are developed while copying is refined. Together, these findings provide a fine-grained view of how cross-lingual generalization develops during multilingual pretraining.

41.1LGMay 21
Understanding Multimodal Failure in Action-Chunking Behavioral Cloning

Lorenzo Mazza, Massimiliano Datres, Ariel Rodriguez et al.

Behavioral cloning becomes difficult when the same observation admits several valid actions. We study this problem for action-chunking policies and show that different multimodal parameterizations fail in different ways. For latent-variable policies, posterior-prior regularization makes deployment-time sampling more reliable, but excessive regularization removes the action-conditioned information needed to distinguish demonstrated modes. Reducing this regularization can preserve mode information, but then success depends on whether the prior covers the relevant latent regions. For action-space generative policies, multimodality is constrained by the smoothness of the base-to-action transport: a map with small Lipschitz constant cannot assign substantial probability to many well-separated modes. Covering many modes therefore requires either sharp transitions in base space or off-support bridge regions in action space. Experiments on synthetic multimodal tasks and robotic simulation benchmarks support these mechanisms.

LGMar 20, 2024Code
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning

Raffaele Paolino, Sohir Maskey, Pascal Welke et al.

We introduce $r$-loopy Weisfeiler-Leman ($r$-$\ell{}$WL), a novel hierarchy of graph isomorphism tests and a corresponding GNN framework, $r$-$\ell{}$MPNN, that can count cycles up to length $r + 2$. Most notably, we show that $r$-$\ell{}$WL can count homomorphisms of cactus graphs. This strictly extends classical 1-WL, which can only count homomorphisms of trees and, in fact, is incomparable to $k$-WL for any fixed $k$. We empirically validate the expressive and counting power of the proposed $r$-$\ell{}$MPNN on several synthetic datasets and present state-of-the-art predictive performance on various real-world datasets. The code is available at https://github.com/RPaolino/loopy

LGMar 3
The Price of Robustness: Stable Classifiers Need Overparameterization

Jonas von Berg, Adalbert Fono, Massimiliano Datres et al.

The relationship between overparameterization, stability, and generalization remains incompletely understood in the setting of discontinuous classifiers. We address this gap by establishing a generalization bound for finite function classes that improves inversely with class stability, defined as the expected distance to the decision boundary in the input domain (margin). Interpreting class stability as a quantifiable notion of robustness, we derive as a corollary a law of robustness for classification that extends the results of Bubeck and Sellke beyond smoothness assumptions to discontinuous functions. In particular, any interpolating model with $p \approx n$ parameters on $n$ data points must be unstable, implying that substantial overparameterization is necessary to achieve high stability. We obtain analogous results for parameterized infinite function classes by analyzing a stronger robustness measure derived from the margin in the codomain, which we refer to as the normalized co-stability. Experiments support our theory: stability increases with model size and correlates with test performance, while traditional norm-based measures remain largely uninformative.

92.1CCApr 10
Complexity Theory meets Ordinary Differential Equations

Adalbert Fono, Noah Wedlich, Holger Boche et al.

This contribution investigates the computational complexity of simulating linear ordinary differential equations (ODEs) on digital computers. We provide an exact characterization of the complexity blowup for a class of ODEs of arbitrary order based on their algebraic properties, extending previous characterization of first order ODEs. Complexity blowup indeed arises in most ODEs (except for certain degenerate cases) and means that there exists a low complexity input signal, which can be generated on a Turing machine in polynomial time, leading to a corresponding high complexity output signal of the system in the sense that the computation time for determining an approximation up to $n$ significant digits grows faster than any polynomial in $n$. Similarly, we derive an analogous blowup criterion for a subclass of first-order systems of linear ODEs. Finally, we discuss the implications for the simulation of analog systems governed by ODEs and exemplarily apply our framework to a simple model of neuronal dynamics$-$the leaky integrate-and-fire neuron$-$heavily employed in neuroscience.

LGMay 22, 2023Code
A Fractional Graph Laplacian Approach to Oversmoothing

Sohir Maskey, Raffaele Paolino, Aras Bacho et al.

Graph neural networks (GNNs) have shown state-of-the-art performances in various applications. However, GNNs often struggle to capture long-range dependencies in graphs due to oversmoothing. In this paper, we generalize the concept of oversmoothing from undirected to directed graphs. To this aim, we extend the notion of Dirichlet energy by considering a directed symmetrically normalized Laplacian. As vanilla graph convolutional networks are prone to oversmooth, we adopt a neural graph ODE framework. Specifically, we propose fractional graph Laplacian neural ODEs, which describe non-local dynamics. We prove that our approach allows propagating information between distant nodes while maintaining a low probability of long-distance jumps. Moreover, we show that our method is more flexible with respect to the convergence of the graph's Dirichlet energy, thereby mitigating oversmoothing. We conduct extensive experiments on synthetic and real-world graphs, both directed and undirected, demonstrating our method's versatility across diverse graph homophily levels. Our code is available at https://github.com/RPaolino/fLode .

GRFeb 24
Physics-Informed Video Diffusion For Shallow Water Equations

Yang Bai, George Eskandar, Ziyuan Liu et al.

Traditional fluid dynamics simulation pipelines combine numerical solvers with rendering, producing highly realistic results but at considerable computational cost. Diffusion-based generative video models offer a faster alternative, yet often ignore physical laws and thus fail to capture consistent dynamics. We propose a physics-informed video diffusion framework that jointly generates visual outputs and physical states. Unlike prior two-stage approaches that first simulate the physical variables and then render, we directly integrate physics constraints into the generative process, enabling simultaneous prediction of physical states and realistic videos without a separate rendering step. Built on the two-dimensional shallow water equations with terrain topography, our method produces temporally coherent water flow while maintaining physical plausibility. Experiments show that it outperforms purely data-driven video diffusion baselines in both realism and physical fidelity, while generating videos significantly faster than traditional simulation-plus-rendering pipelines.

LGApr 4, 2024
Generalization Bounds for Message Passing Networks on Mixture of Graphons

Sohir Maskey, Gitta Kutyniok, Ron Levie

We study the generalization capabilities of Message Passing Neural Networks (MPNNs), a prevalent class of Graph Neural Networks (GNN). We derive generalization bounds specifically for MPNNs with normalized sum aggregation and mean aggregation. Our analysis is based on a data generation model incorporating a finite set of template graphons. Each graph within this framework is generated by sampling from one of the graphons with a certain degree of perturbation. In particular, we extend previous MPNN generalization results to a more realistic setting, which includes the following modifications: 1) we analyze simple random graphs with Bernoulli-distributed edges instead of weighted graphs; 2) we sample both graphs and graph signals from perturbed graphons instead of clean graphons; and 3) we analyze sparse graphs instead of dense graphs. In this more realistic and challenging scenario, we provide a generalization bound that decreases as the average number of nodes in the graphs increases. Our results imply that MPNNs with higher complexity than the size of the training set can still generalize effectively, as long as the graphs are sufficiently large.

LGFeb 18, 2025
Probabilistic neural operators for functional uncertainty quantification

Christopher Bülte, Philipp Scholl, Gitta Kutyniok

Neural operators aim to approximate the solution operator of a system of differential equations purely from data. They have shown immense success in modeling complex dynamical systems across various domains. However, the occurrence of uncertainties inherent in both model and data has so far rarely been taken into account\textemdash{}a critical limitation in complex, chaotic systems such as weather forecasting. In this paper, we introduce the probabilistic neural operator (PNO), a framework for learning probability distributions over the output function space of neural operators. PNO extends neural operators with generative modeling based on strictly proper scoring rules, integrating uncertainty information directly into the training process. We provide a theoretical justification for the approach and demonstrate improved performance in quantifying uncertainty across different domains and with respect to different baselines. Furthermore, PNO requires minimal adjustment to existing architectures, shows improved performance for most probabilistic prediction tasks, and leads to well-calibrated predictive distributions and adequate uncertainty representations even for long dynamical trajectories. Implementing our approach into large-scale models for physical applications can lead to improvements in corresponding uncertainty quantification and extreme event identification, ultimately leading to a deeper understanding of the prediction of such surrogate models.

LGApr 25, 2025
An Axiomatic Assessment of Entropy- and Variance-based Uncertainty Quantification in Regression

Christopher Bülte, Yusuf Sale, Timo Löhr et al.

Uncertainty quantification (UQ) is crucial in machine learning, yet most (axiomatic) studies of uncertainty measures focus on classification, leaving a gap in regression settings with limited formal justification and evaluations. In this work, we introduce a set of axioms to rigorously assess measures of aleatoric, epistemic, and total uncertainty in supervised regression. By utilizing a predictive exponential family, we can generalize commonly used approaches for uncertainty representation and corresponding uncertainty measures. More specifically, we analyze the widely used entropy- and variance-based measures regarding limitations and challenges. Our findings provide a principled foundation for uncertainty quantification in regression, offering theoretical insights and practical guidelines for reliable uncertainty assessment.