NANov 9, 2008
Localized linear polynomial operators and quadrature formulas on the sphereQ. T. Le Gia, H. N. Mhaskar
The purpose of this paper is to construct universal, auto--adaptive, localized, linear, polynomial (-valued) operators based on scattered data on the (hyper--)sphere $\SS^q$ ($q\ge 2$). The approximation and localization properties of our operators are studied theoretically in deterministic as well as probabilistic settings. Numerical experiments are presented to demonstrate their superiority over traditional least squares and discrete Fourier projection polynomial approximations. An essential ingredient in our construction is the construction of quadrature formulas based on scattered data, exact for integrating spherical polynomials of (moderately) high degree. Our formulas are based on scattered sites; i.e., in contrast to such well known formulas as Driscoll--Healy formulas, we need not choose the location of the sites in any particular manner. While the previous attempts to construct such formulas have yielded formulas exact for spherical polynomials of degree at most 18, we are able to construct formulas exact for spherical polynomials of degree 178.
CANov 16, 2008
Polynomial operators and local smoothness classes on the unit interval, IIH. N. Mhaskar
We prove the existence of quadrature formulas exact for integrating high degree polynomials with respect to Jacobi weights based on scattered data on the unit interval. We also obtain a characterization of local Besov spaces using the coefficients of a tight frame expansion.
NANov 24, 2010
A construction of linear bounded interpolatory operators on the torusS. Chandrasekaran, H. N. Mhaskar
Let $q\ge 1$ be an integer. Given $M$ samples of a smooth function of $q$ variables, $2π$--periodic in each variable, we consider the problem of constructing a $q$--variate trigonometric polynomial of spherical degree $Ø(M^{1/q})$ which interpolates the given data, remains bounded (independent of $M$) on $[-π,π]^q$, and converges to the function at an optimal rate on the set where the data becomes dense. We prove that the solution of an appropriate optimization problem leads to such an interpolant. Numerical examples are given to demonstrate that this procedure overcomes the Runge phenomenon when interpolation at equidistant nodes on $[-1,1]$ is constructed, and also provides a respectable approximation for bivariate grid data, which does not become dense on the whole domain.
NAOct 3, 2017
Minimum Sobolev norm interpolation of derivative dataS. Chandrasekaran, C. H. Gorman, H. N. Mhaskar
We study the problem of reconstructing a function on a manifold satisfying some mild conditions, given data on the values and some derivatives of the function at arbitrary points on the manifold. While the problem of finding a polynomial of two variables with total degree $\le n$ given the values of the polynomial and some of its derivatives at exactly the same number of points as the dimension of the polynomial space is sometimes impossible, we show that such a problem always has a solution in a very general situation if the degree of the polynomials is sufficiently large. We give estimates on how large the degree should be, and give explicit constructions for such a polynomial even in a far more general case. As the number of sampling points at which the data is available increases, our polynomials converge to the target function on the set where the sampling points are dense. Numerical examples in single and double precision show that this method is stable and of high-order.
LGFeb 1, 2023
Local transfer learning from one data space to anotherH. N. Mhaskar, Ryan O'Dowd
A fundamental problem in manifold learning is to approximate a functional relationship in a data chosen randomly from a probability distribution supported on a low dimensional sub-manifold of a high dimensional ambient Euclidean space. The manifold is essentially defined by the data set itself and, typically, designed so that the data is dense on the manifold in some sense. The notion of a data space is an abstraction of a manifold encapsulating the essential properties that allow for function approximation. The problem of transfer learning (meta-learning) is to use the learning of a function on one data set to learn a similar function on a new data set. In terms of function approximation, this means lifting a function on one data space (the base data space) to another (the target data space). This viewpoint enables us to connect some inverse problems in applied mathematics (such as inverse Radon transform) with transfer learning. In this paper we examine the question of such lifting when the data is assumed to be known only on a part of the base data space. We are interested in determining subsets of the target data space on which the lifting can be defined, and how the local smoothness of the function and its lifting are related.
LGFeb 20, 2024
Learning on manifolds without manifold learningH. N. Mhaskar, Ryan O'Dowd
Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning. The manifold hypothesis assumes that the data is sampled from an unknown submanifold of a high dimensional Euclidean space. A great deal of research deals with obtaining information about this manifold, such as the eigendecomposition of the Laplace-Beltrami operator or coordinate charts, and using this information for function approximation. This two-step approach implies some extra errors in the approximation stemming from estimating the basic quantities of the data manifold in addition to the errors inherent in function approximation. In this paper, we project the unknown manifold as a submanifold of an ambient hypersphere and study the question of constructing a one-shot approximation using a specially designed sequence of localized spherical polynomial kernels on the hypersphere. Our approach does not require preprocessing of the data to obtain information about the manifold other than its dimension. We give optimal rates of approximation for relatively ``rough'' functions.
LGSep 29, 2025
A signal separation view of classificationH. N. Mhaskar, Ryan O'Dowd
The problem of classification in machine learning has often been approached in terms of function approximation. In this paper, we propose an alternative approach for classification in arbitrary compact metric spaces which, in theory, yields both the number of classes, and a perfect classification using a minimal number of queried labels. Our approach uses localized trigonometric polynomial kernels initially developed for the point source signal separation problem in signal processing. Rather than point sources, we argue that the various classes come from different probability distributions. The localized kernel technique developed for separating point sources is then shown to separate the supports of these distributions. This is done in a hierarchical manner in our MASC algorithm to accommodate touching/overlapping class boundaries. We illustrate our theory on several simulated and real life datasets, including the Salinas and Indian Pines hyperspectral datasets and a document dataset.
LGMay 12, 2021
A function approximation approach to the prediction of blood glucose levelsH. N. Mhaskar, S. V. Pereverzyev, M. D. van der Walt
The problem of real time prediction of blood glucose (BG) levels based on the readings from a continuous glucose monitoring (CGM) device is a problem of great importance in diabetes care, and therefore, has attracted a lot of research in recent years, especially based on machine learning. An accurate prediction with a 30, 60, or 90 minute prediction horizon has the potential of saving millions of dollars in emergency care costs. In this paper, we treat the problem as one of function approximation, where the value of the BG level at time $t+h$ (where $h$ the prediction horizon) is considered to be an unknown function of $d$ readings prior to the time $t$. This unknown function may be supported in particular on some unknown submanifold of the $d$-dimensional Euclidean space. While manifold learning is classically done in a semi-supervised setting, where the entire data has to be known in advance, we use recent ideas to achieve an accurate function approximation in a supervised setting; i.e., construct a model for the target function. We use the state-of-the-art clinically relevant PRED-EGA grid to evaluate our results, and demonstrate that for a real life dataset, our method performs better than a standard deep network, especially in hypoglycemic and hyperglycemic regimes. One noteworthy aspect of this work is that the training data and test data may come from different distributions.
LGOct 8, 2020
A low discrepancy sequence on graphsA. Cloninger, H. N. Mhaskar
Many applications such as election forecasting, environmental monitoring, health policy, and graph based machine learning require taking expectation of functions defined on the vertices of a graph. We describe a construction of a sampling scheme analogous to the so called Leja points in complex potential theory that can be proved to give low discrepancy estimates for the approximation of the expected value by the impirical expected value based on these points. In contrast to classical potential theory where the kernel is fixed and the equilibrium distribution depends upon the kernel, we fix a probability distribution and construct a kernel (which represents the graph structure) for which the equilibrium distribution is the given probability distribution. Our estimates do not depend upon the size of the graph.
FAJul 10, 2019
Super-resolution meets machine learning: approximation of measuresH. N. Mhaskar
The problem of super-resolution in general terms is to recuperate a finitely supported measure $μ$ given finitely many of its coefficients $\hatμ(k)$ with respect to some orthonormal system. The interesting case concerns situations, where the number of coefficients required is substantially smaller than a power of the reciprocal of the minimal separation among the points in the support of $μ$. In this paper, we consider the more severe problem of recuperating $μ$ approximately without any assumption on $μ$ beyond having a finite total variation. In particular, $μ$ may be supported on a continuum, so that the minimal separation among the points in the support of $μ$ is $0$. A variant of this problem is also of interest in machine learning as well as the inverse problem of de-convolution. We define an appropriate notion of a distance between the target measure and its recuperated version, give an explicit expression for the recuperation operator, and estimate the distance between $μ$ and its approximation. We show that these estimates are the best possible in many different ways. We also explain why for a finitely supported measure the approximation quality of its recuperation is bounded from below if the amount of information is smaller than what is demanded in the super-resolution problem.
LGMay 30, 2019
Function approximation by deep networksH. N. Mhaskar, T. Poggio
We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On the other hand, a theorem called good propagation of errors allows to `lift' theorems about shallow networks to those about deep networks with an appropriate choice of norms, smoothness, etc. We illustrate this in three contexts where each channel in the deep network calculates a spherical polynomial, a non-smooth ReLU network, or another zonal function network related closely with the ReLU network.
LGJan 10, 2019
A witness function based construction of discriminative models using Hermite polynomialsH. N. Mhaskar, A. Cloninger, X. Cheng
In machine learning, we are given a dataset of the form $\{(\mathbf{x}_j,y_j)\}_{j=1}^M$, drawn as i.i.d. samples from an unknown probability distribution $μ$; the marginal distribution for the $\mathbf{x}_j$'s being $μ^*$. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a `witness function' in classification problems. Thus, if the value of this estimator at a point $\mathbf{x}$ exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.
LGJul 18, 2017
A deep learning approach to diabetic blood glucose predictionH. N. Mhaskar, S. V. Pereverzyev, M. D. van der Walt
We consider the question of 30-minute prediction of blood glucose levels measured by continuous glucose monitoring devices, using clinical data. While most studies of this nature deal with one patient at a time, we take a certain percentage of patients in the data set as training data, and test on the remainder of the patients; i.e., the machine need not re-calibrate on the new patients in the data set. We demonstrate how deep learning can outperform shallow networks in this example. One novelty is to demonstrate how a parsimonious deep representation can be constructed using domain knowledge.
LGJul 24, 2016
Deep nets for local manifold learningCharles K. Chui, H. N. Mhaskar
The problem of extending a function $f$ defined on a training data $\mathcal{C}$ on an unknown manifold $\mathbb{X}$ to the entire manifold and a tubular neighborhood of this manifold is considered in this paper. For $\mathbb{X}$ embedded in a high dimensional ambient Euclidean space $\mathbb{R}^D$, a deep learning algorithm is developed for finding a local coordinate system for the manifold {\bf without eigen--decomposition}, which reduces the problem to the classical problem of function approximation on a low dimensional cube. Deep nets (or multilayered neural networks) are proposed to accomplish this approximation scheme by using the training data. Our methods do not involve such optimization techniques as back--propagation, while assuring optimal (a priori) error bounds on the output in terms of the number of derivatives of the target function. In addition, these methods are universal, in that they do not require a prior knowledge of the smoothness of the target function, but adjust the accuracy of approximation locally and automatically, depending only upon the local smoothness of the target function. Our ideas are easily extended to solve both the pre--image problem and the out--of--sample extension problem, with a priori bounds on the growth of the function thus extended.
LGSep 28, 2009
Eignets for function approximation on manifoldsH. N. Mhaskar
Let $\XX$ be a compact, smooth, connected, Riemannian manifold without boundary, $G:\XX\times\XX\to \RR$ be a kernel. Analogous to a radial basis function network, an eignet is an expression of the form $\sum_{j=1}^M a_jG(\circ,y_j)$, where $a_j\in\RR$, $y_j\in\XX$, $1\le j\le M$. We describe a deterministic, universal algorithm for constructing an eignet for approximating functions in $L^p(μ;\XX)$ for a general class of measures $μ$ and kernels $G$. Our algorithm yields linear operators. Using the minimal separation amongst the centers $y_j$ as the cost of approximation, we give modulus of smoothness estimates for the degree of approximation by our eignets, and show by means of a converse theorem that these are the best possible for every \emph{individual function}. We also give estimates on the coefficients $a_j$ in terms of the norm of the eignet. Finally, we demonstrate that if any sequence of eignets satisfies the optimal estimates for the degree of approximation of a smooth function, measured in terms of the minimal separation, then the derivatives of the eignets also approximate the corresponding derivatives of the target function in an optimal manner.