62.5NAJun 2
Sampling and reconstruction of convex functionsAndrea Bonito, Albert Cohen, Wolfgang Dahmen et al.
We discuss optimal recovery for classes of multivariate convex functions from given point samples, as well as the sampling numbers of these classes, corresponding to optimal sample choices. Upper and lower bounds for either variant are established when the reconstruction error is measured in $L_p$ for $1\leq p\leq \infty$. These bounds match, sometimes up to logarithmic factors, and therefore characterize the respective optimal rate of decay. For classical smoothness classes such as Sobolev, Hölder or Besov spaces, it is well known that the optimal decay rate of sampling numbers can be achieved by sampling on uniform tensor product grids and using linear methods of reconstruction, such as piecewise polynomial interpolation. One of the main findings in this paper is that for classes of convex functions, these procedures generally produce suboptimal rates, except when $p=1$ and $p=\infty$, and are outperformed by nonlinear reconstruction methods that do not employ tensor product grids.
MLNov 30, 2022
Limitations on approximation by deep and shallow neural networksGuergana Petrova, Przemysław Wojtaszczyk
We prove Carl's type inequalities for the error of approximation of compact sets K by deep and shallow neural networks. This in turn gives lower bounds on how well we can approximate the functions in K when requiring the approximants to come from outputs of such networks. Our results are obtained as a byproduct of the study of the recently introduced Lipschitz widths.
MLOct 11, 2023
Neural networks: deep, shallow, or in between?Guergana Petrova, Przemyslaw Wojtaszczyk
We give estimates from below for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions. We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity, and that there is no gain if we fix the depth and let the width W go to infinity.
LGMar 30, 2022
Optimal LearningPeter Binev, Andrea Bonito, Ronald DeVore et al.
This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of how well $\hat f$ predicts $f$, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal $\hat f$ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation $\hat f$ of the function $f$ from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of $f$. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
LGJul 28, 2021
Neural Network Approximation of Refinable FunctionsIngrid Daubechies, Ronald DeVore, Nadav Dym et al.
In the desire to quantify the success of neural networks in deep learning and other applications, there is a great interest in understanding which functions are efficiently approximated by the outputs of neural networks. By now, there exists a variety of results which show that a wide range of functions can be approximated with sometimes surprising accuracy by these outputs. For example, it is known that the set of functions that can be approximated with exponential accuracy (in terms of the number of parameters used) includes, on one hand, very smooth functions such as polynomials and analytic functions (see e.g. \cite{E,S,Y}) and, on the other hand, very rough functions such as the Weierstrass function (see e.g. \cite{EPGB,DDFHP}), which is nowhere differentiable. In this paper, we add to the latter class of rough functions by showing that it also includes refinable functions. Namely, we show that refinable functions are approximated by the outputs of deep ReLU networks with a fixed width and increasing depth with accuracy exponential in terms of their number of parameters. Our results apply to functions used in the standard construction of wavelets as well as to functions constructed via subdivision algorithms in Computer Aided Geometric Design.
NAAug 5, 2016
Data Assimilation and Sampling in Banach spacesRonald DeVore, Guergana Petrova, Przemyslaw Wojtaszczyk
This paper studies the problem of approximating a function $f$ in a Banach space $X$ from measurements $l_j(f)$, $j=1,\dots,m$, where the $l_j$ are linear functionals from $X^*$. Most results study this problem for classical Banach spaces $X$ such as the $L_p$ spaces, $1\le p\le \infty$, and for $K$ the unit ball of a smoothness space in $X$. Our interest in this paper is in the model classes $K=K(ε,V)$, with $ε>0$ and $V$ a finite dimensional subspace of $X$, which consists of all $f\in X$ such that $dist(f,V)_X\le ε$. These model classes, called {\it approximation sets}, arise naturally in application domains such as parametric partial differential equations, uncertainty quantification, and signal processing. A general theory for the recovery of approximation sets in a Banach space is given. This theory includes tight a priori bounds on optimal performance, and algorithms for finding near optimal approximations. We show how the recovery problem for approximation sets is connected with well-studied concepts in Banach space theory such as liftings and the angle between spaces. Examples are given that show how this theory can be used to recover several recent results on sampling and data assimilation.
NAJun 15, 2015
Data Assimilation in Reduced ModelingPeter Binev, Albert Cohen, Wolfgang Dahmen et al.
We consider the problem of optimal recovery of an element $u$ of a Hilbert space $\mathcal{H}$ from $m$ measurements obtained through known linear functionals on $\mathcal{H}$. Problems of this type are well studied \cite{MRW} under an assumption that $u$ belongs to a prescribed model class, e.g. a known compact subset of $\mathcal{H}$. Motivated by reduced modeling for parametric partial differential equations, this paper considers another setting where the additional information about $u$ is in the form of how well $u$ can be approximated by a certain known subspace $V_n$ of $\mathcal{H}$ of dimension $n$, or more generally, how well $u$ can be approximated by each $k$-dimensional subspace $V_k$ of a sequence of nested subspaces $V_0\subset V_1\cdots\subset V_n$. A recovery algorithm for the one-space formulation, proposed in \cite{MPPY}, is proven here to be optimal and to have a simple formulation, if certain favorable bases are chosen to represent $V_n$ and the measurements. The major contribution of the present paper is to analyze the multi-space case for which it is shown that the set of all $u$ satisfying the given information can be described as the intersection of a family of known ellipsoids in $\mathcal{H}$. It follows that a near optimal recovery algorithm in the multi-space problem is to identify any point in this intersection which can provide a much better accuracy than in the one-space problem. Two iterative algorithms based on alternating projections are proposed for recovery in the multi-space problem. A detailed analysis of one of them provides a posteriori performance estimates for the iterates, stopping criteria, and convergence rates. Since the limit of the algorithm is a point in the intersection of the aforementioned ellipsoids, it provides a near optimal recovery for $u$.
NAMay 14, 2015
Rescaled Pure Greedy Algorithm for Hilbert and Banach SpacesGuergana Petrova
We show that a very simple modification of the Pure Greedy Algorithm for approximating functions by sparse sums from a dictionary in a Hilbert or more generally a Banach space has optimal convergence rates on the class of convex combinations of dictionary elements