Ronald DeVore

NA
10papers
1,521citations
Novelty46%
AI Score44

10 Papers

NAJul 3, 2008
Iteratively re-weighted least squares minimization for sparse recovery

Ingrid Daubechies, Ronald DeVore, Massimo Fornasier et al.

We analyze an Iteratively Re-weighted Least Squares (IRLS) algorithm for promoting l1-minimization in sparse and compressible vector recovery. We prove its convergence and we estimate its local rate. We show how the algorithm can be modified in order to promote lt-minimization for t<1, and how this modification produces superlinear rates of convergence.

NAJul 23, 2014
Tensor-Sparsity of Solutions to High-Dimensional Elliptic Partial Differential Equations

Wolfgang Dahmen, Ronald DeVore, Lars Grasedyck et al.

A recurring theme in attempts to break the curse of dimensionality in the numerical approximations of solutions to high-dimensional partial differential equations (PDEs) is to employ some form of sparse tensor approximation. Unfortunately, there are only a few results that quantify the possible advantages of such an approach. This paper introduces a class $Σ_n$ of functions, which can be written as a sum of rank-one tensors using a total of at most $n$ parameters and then uses this notion of sparsity to prove a regularity theorem for certain high-dimensional elliptic PDEs. It is shown, among other results, that whenever the right-hand side $f$ of the elliptic PDE can be approximated with a certain rate $\mathcal{O}(n^{-r})$ in the norm of ${\mathrm H}^{-1}$ by elements of $Σ_n$, then the solution $u$ can be approximated in ${\mathrm H}^1$ from $Σ_n$ to accuracy $\mathcal{O}(n^{-r'})$ for any $r'\in (0,r)$. Since these results require knowledge of the eigenbasis of the elliptic operator considered, we propose a second "basis-free" model of tensor sparsity and prove a regularity theorem for this second sparsity model as well. We then proceed to address the important question of the extent such regularity theorems translate into results on computational complexity. It is shown how this second model can be used to derive computational algorithms with performance that breaks the curse of dimensionality on certain model high-dimensional elliptic PDEs with tensor-sparse data.

97.6NAJun 2
Sampling and reconstruction of convex functions

Andrea Bonito, Albert Cohen, Wolfgang Dahmen et al.

We discuss optimal recovery for classes of multivariate convex functions from given point samples, as well as the sampling numbers of these classes, corresponding to optimal sample choices. Upper and lower bounds for either variant are established when the reconstruction error is measured in $L_p$ for $1\leq p\leq \infty$. These bounds match, sometimes up to logarithmic factors, and therefore characterize the respective optimal rate of decay. For classical smoothness classes such as Sobolev, Hölder or Besov spaces, it is well known that the optimal decay rate of sampling numbers can be achieved by sampling on uniform tensor product grids and using linear methods of reconstruction, such as piecewise polynomial interpolation. One of the main findings in this paper is that for classes of convex functions, these procedures generally produce suboptimal rates, except when $p=1$ and $p=\infty$, and are outperformed by nonlinear reconstruction methods that do not employ tensor product grids.

MLJul 28, 2023
Weighted variation spaces and approximation by shallow ReLU networks

Ronald DeVore, Robert D. Nowak, Rahul Parhi et al.

We investigate the approximation of functions $f$ on a bounded domain $Ω\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $Ω$ whose approximation rates do not grow unbounded with the input dimension. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes. The present paper is concerned with the definition of these novel model classes on domains $Ω$. The current definition of these model classes does not depend on the domain $Ω$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.

LGMar 30, 2022
Optimal Learning

Peter Binev, Andrea Bonito, Ronald DeVore et al.

This paper studies the problem of learning an unknown function $f$ from given data about $f$. The learning problem is to give an approximation $\hat f$ to $f$ that predicts the values of $f$ away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about $f$ (known as a model class assumption), (ii) how we measure the accuracy of how well $\hat f$ predicts $f$, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal $\hat f$ can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation $\hat f$ of the function $f$ from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of $f$. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.

LGJul 28, 2021
Neural Network Approximation of Refinable Functions

Ingrid Daubechies, Ronald DeVore, Nadav Dym et al.

In the desire to quantify the success of neural networks in deep learning and other applications, there is a great interest in understanding which functions are efficiently approximated by the outputs of neural networks. By now, there exists a variety of results which show that a wide range of functions can be approximated with sometimes surprising accuracy by these outputs. For example, it is known that the set of functions that can be approximated with exponential accuracy (in terms of the number of parameters used) includes, on one hand, very smooth functions such as polynomials and analytic functions (see e.g. \cite{E,S,Y}) and, on the other hand, very rough functions such as the Weierstrass function (see e.g. \cite{EPGB,DDFHP}), which is nowhere differentiable. In this paper, we add to the latter class of rough functions by showing that it also includes refinable functions. Namely, we show that refinable functions are approximated by the outputs of deep ReLU networks with a fixed width and increasing depth with accuracy exponential in terms of their number of parameters. Our results apply to functions used in the standard construction of wavelets as well as to functions constructed via subdivision algorithms in Computer Aided Geometric Design.

NAAug 5, 2016
Data Assimilation and Sampling in Banach spaces

Ronald DeVore, Guergana Petrova, Przemyslaw Wojtaszczyk

This paper studies the problem of approximating a function $f$ in a Banach space $X$ from measurements $l_j(f)$, $j=1,\dots,m$, where the $l_j$ are linear functionals from $X^*$. Most results study this problem for classical Banach spaces $X$ such as the $L_p$ spaces, $1\le p\le \infty$, and for $K$ the unit ball of a smoothness space in $X$. Our interest in this paper is in the model classes $K=K(ε,V)$, with $ε>0$ and $V$ a finite dimensional subspace of $X$, which consists of all $f\in X$ such that $dist(f,V)_X\le ε$. These model classes, called {\it approximation sets}, arise naturally in application domains such as parametric partial differential equations, uncertainty quantification, and signal processing. A general theory for the recovery of approximation sets in a Banach space is given. This theory includes tight a priori bounds on optimal performance, and algorithms for finding near optimal approximations. We show how the recovery problem for approximation sets is connected with well-studied concepts in Banach space theory such as liftings and the angle between spaces. Examples are given that show how this theory can be used to recover several recent results on sampling and data assimilation.

NASep 23, 2015
Sparse polynomial approximation of parametric elliptic PDEs. Part II: lognormal coefficients

Markus Bachmayr, Albert Cohen, Ronald DeVore et al.

Elliptic partial differential equations with diffusion coefficients of lognormal form, that is $a=exp(b)$, where $b$ is a Gaussian random field, are considered. We study the $\ell^p$ summability properties of the Hermite polynomial expansion of the solution in terms of the countably many scalar parameters appearing in a given representation of $b$. These summability results have direct consequences on the approximation rates of best $n$-term truncated Hermite expansions. Our results significantly improve on the state of the art estimates available for this problem. In particular, they take into account the support properties of the basis functions involved in the representation of $b$, in addition to the size of these functions. One interesting conclusion from our analysis is that in certain relevant cases, the Karhunen-Loève representation of $b$ may not be the best choice concerning the resulting sparsity and approximability of the Hermite expansion.

NAJun 15, 2015
Data Assimilation in Reduced Modeling

Peter Binev, Albert Cohen, Wolfgang Dahmen et al.

We consider the problem of optimal recovery of an element $u$ of a Hilbert space $\mathcal{H}$ from $m$ measurements obtained through known linear functionals on $\mathcal{H}$. Problems of this type are well studied \cite{MRW} under an assumption that $u$ belongs to a prescribed model class, e.g. a known compact subset of $\mathcal{H}$. Motivated by reduced modeling for parametric partial differential equations, this paper considers another setting where the additional information about $u$ is in the form of how well $u$ can be approximated by a certain known subspace $V_n$ of $\mathcal{H}$ of dimension $n$, or more generally, how well $u$ can be approximated by each $k$-dimensional subspace $V_k$ of a sequence of nested subspaces $V_0\subset V_1\cdots\subset V_n$. A recovery algorithm for the one-space formulation, proposed in \cite{MPPY}, is proven here to be optimal and to have a simple formulation, if certain favorable bases are chosen to represent $V_n$ and the measurements. The major contribution of the present paper is to analyze the multi-space case for which it is shown that the set of all $u$ satisfying the given information can be described as the intersection of a family of known ellipsoids in $\mathcal{H}$. It follows that a near optimal recovery algorithm in the multi-space problem is to identify any point in this intersection which can provide a much better accuracy than in the one-space problem. Two iterative algorithms based on alternating projections are proposed for recovery in the multi-space problem. A detailed analysis of one of them provides a posteriori performance estimates for the iterates, stopping criteria, and convergence rates. Since the limit of the algorithm is a point in the intersection of the aforementioned ellipsoids, it provides a near optimal recovery for $u$.

NAJun 15, 2015
Orthogonal Matching Pursuit under the Restricted Isometry Property

Albert Cohen, Wolfgang Dahmen, Ronald DeVore

This paper is concerned with the performance of Orthogonal Matching Pursuit (OMP) algorithms applied to a dictionary $\mathcal{D}$ in a Hilbert space $\mathcal{H}$. Given an element $f\in \mathcal{H}$, OMP generates a sequence of approximations $f_n$, $n=1,2,\dots$, each of which is a linear combination of $n$ dictionary elements chosen by a greedy criterion. It is studied whether the approximations $f_n$ are in some sense comparable to {\em best $n$ term approximation} from the dictionary. One important result related to this question is a theorem of Zhang \cite{TZ} in the context of sparse recovery of finite dimensional signals. This theorem shows that OMP exactly recovers $n$-sparse signal, whenever the dictionary $\mathcal{D}$ satisfies a Restricted Isometry Property (RIP) of order $An$ for some constant $A$, and that the procedure is also stable in $\ell^2$ under measurement noise. The main contribution of the present paper is to give a structurally simpler proof of Zhang's theorem, formulated in the general context of $n$ term approximation from a dictionary in arbitrary Hilbert spaces $\mathcal{H}$. Namely, it is shown that OMP generates near best $n$ term approximations under a similar RIP condition.