Distributed Learning via Filtered Hyperinterpolation on Manifolds
This addresses the challenge of handling large datasets in manifold learning for applications like astrophysics and medical diagnosis, but it appears incremental as it extends existing filtered hyperinterpolation to a distributed setting.
The paper tackles the problem of learning real-valued functions on manifolds from input-output data, presenting a distributed learning approach that splits data among servers and combines sub-models into a global estimator. It proves approximation rates, achieving optimal order for non-distributed methods.
Learning mappings of data on manifolds is an important topic in contemporary machine learning, with applications in astrophysics, geophysics, statistical physics, medical diagnosis, biochemistry, 3D object analysis. This paper studies the problem of learning real-valued functions on manifolds through filtered hyperinterpolation of input-output data pairs where the inputs may be sampled deterministically or at random and the outputs may be clean or noisy. Motivated by the problem of handling large data sets, it presents a parallel data processing approach which distributes the data-fitting task among multiple servers and synthesizes the fitted sub-models into a global estimator. We prove quantitative relations between the approximation quality of the learned function over the entire manifold, the type of target function, the number of servers, and the number and type of available samples. We obtain the approximation rates of convergence for distributed and non-distributed approaches. For the non-distributed case, the approximation order is optimal.