G. Welper

h-index3

9papers

41citations

Novelty48%

AI Score25

Ranked #170,199 of 205,806 authors (top 83%)#36,966 in LG (top 87%)

9 Papers

NAJan 2, 2016

Adaptive Anisotropic Petrov-Galerkin Methods for First Order Transport Equations

W. Dahmen, G. Kutyniok, W. -Q Lim et al.

This paper builds on recent developments of adaptive methods for linear transport equations based on certain stable variational formulations of Petrov-Galerkin type. The variational formulations allow us to employ meshes with cells of arbitrary aspect ratios. We develop a refinement scheme generating highly anisotropic partitions that is inspired by shearlet systems. We establish approximation rates for N-term approximations from corresponding piecewise polynomials for certain compact cartoon classes of functions. In contrast to earlier results in a curvelet or shearlet context the cartoon classes are concisely defined through certain characteristic parameters and the dependence of the approximation rates on these parameters is made explicit here. The approximation rate results serve then as a benchmark for subsequent applications to adaptive Galerkin solvers for transport equations. In numerical experiments, the new algorithms track C^2-curved shear layers and discontinuities stably and accurately, and realize essentially optimal rates. Finally, we treat parameter dependent transport problems, which arise in kinetic models as well as in radiative transfer. In heterogeneous media these problems feature propagation of singularities along curved characteristics precluding, in particular, fast marching methods based on ray-tracing. Since now the solutions are functions of spatial variables and parameters one has to address the curse of dimensionality. We show computationally, for a model parametric transport problem in heterogeneous media in 2 + 1 dimension, that sparse tensorization of the presently proposed spatial directionally adaptive scheme with hierarchic collocation in ordinate space based on a stable variational formulation high-dimensional phase space, the curse of dimensionality can be removed when approximating averaged bulk quantities.

NAOct 31, 2017

$h$ and $hp$-adaptive Interpolation by Transformed Snapshots for Parametric and Stochastic Hyperbolic PDEs

G. Welper

The numerical approximation of solutions of parametric or stochastic hyperbolic PDEs is still a serious challenge. Because of shock singularities, most methods from the elliptic and parabolic regime, such as reduced basis methods, POD or polynomial chaos expansions, show a poor performance. Recently, Welper [Interpolation of functions with parameter dependent jumps by transformed snapshots. SIAM Journal on Scientific Computing, 39(4):A1225-A1250, 2017] introduced a new approximation method, based on the alignment of the jump sets of the snapshots. If the structure of the jump sets changes with parameter, this assumption is too restrictive. However, these changes are typically local in parameter space, so that in this paper, we explore $h$ and $hp$-adaptive methods to resolve them. Since local refinements do not scale to high dimensions, we introduce an alternative "tensorized" adaption method.

LGSep 17, 2022

Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$

R. Gentile, G. Welper

Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes. This paper balances these two demands and provides an approximation result for shallow networks in $1d$ with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates.

LGSep 9, 2023

Approximation Results for Gradient Descent trained Neural Networks

G. Welper

The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. Although all layers are trained, the gradient flow convergence is based on a neural tangent kernel (NTK) argument for the non-convex second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.

LGMay 19, 2024

Approximation and Gradient Descent Training with Neural Networks

G. Welper

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent kernel (NTK) optimization argument to an under-parametrized regime and show direct approximation bounds for networks trained by gradient flow. Since gradient flow is only an idealization of a practical method, this paper establishes analogous results for networks trained by gradient descent.

LGFeb 6, 2023

Learning Trees of $\ell_0$-Minimization Problems

G. Welper

The problem of computing minimally sparse solutions of under-determined linear systems is $NP$ hard in general. Subsets with extra properties, may allow efficient algorithms, most notably problems with the restricted isometry property (RIP) can be solved by convex $\ell_1$-minimization. While these classes have been very successful, they leave out many practical applications. In this paper, we consider adaptable classes that are tractable after training on a curriculum of increasingly difficult samples. The setup is intended as a candidate model for a human mathematician, who may not be able to tackle an arbitrary proof right away, but may be successful in relatively flexible subclasses, or areas of expertise, after training on a suitable curriculum.

LGJan 20, 2021

Non-Convex Compressed Sensing with Training Data

G. Welper

Efficient algorithms for the sparse solution of under-determined linear systems $Ax = b$ are known for matrices $A$ satisfying suitable assumptions like the restricted isometry property (RIP). Without such assumptions little is known and without any assumptions on $A$ the problem is $NP$-hard. A common approach is to replace $\ell_1$ by $\ell_p$ minimization for $0 < p < 1$, which is no longer convex and typically requires some form of local initial values for provably convergent algorithms. In this paper, we consider an alternative, where instead of suitable initial values we are provided with extra training problems $Ax = B_l$, $l=1, \dots, p$ that are related to our compressed sensing problem. They allow us to find the solution of the original problem $Ax = b$ with high probability in the range of a one layer linear neural network with comparatively few assumptions on the matrix $A$.

LGJul 27, 2020

Universality of Gradient Descent Neural Network Training

G. Welper

It has been observed that design choices of neural networks are often crucial for their successful optimization. In this article, we therefore discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent. This yields the following universality result: If, for a given network, there is any algorithm that can find good network weights for a classification task, then there exists an extension of this network that reproduces these weights and the corresponding forward output by mere gradient descent training. The construction is not intended for practical computations, but it provides some orientation on the possibilities of meta-learning and related approaches.

NAMay 6, 2015

Transformed snapshot interpolation

G. Welper

Functions with jumps and kinks typically arising from parameter dependent or stochastic hyperbolic PDEs are notoriously difficult to approximate. If the jump location in physical space is parameter dependent or random, standard approximation techniques like reduced basis methods, PODs, polynomial chaos, etc. are known to yield poor convergence rates. In order to improve these rates, we propose a new approximation scheme. As reduced basis methods, it relies on snapshots for the reconstruction of parameter dependent functions so that it is efficiently applicable in a PDE context. However, we allow a transformation of the physical coordinates before the use of a snapshot in the reconstruction, which allows to realign the moving discontinuities and yields high convergence rates. The transforms are automatically computed by minimizing a training error. In order to show feasibility of this approach it is tested by 1d and 2d numerical experiments.