NANAMLMar 16

Data-intrinsic approximation in metric spaces

arXiv:2510.1349634.41 citationsh-index: 10
AI Analysis

This work addresses computational efficiency in data analysis for applications requiring large-scale processing, but it appears incremental as it builds on existing approximation theory and methods.

The paper tackles the problem of approximating labeled data in metric spaces to reduce computational burden, by introducing the discrete modulus of continuity as a data-intrinsic measure of regularity and developing algorithms for its computation, with numerical studies validating the approach.

Analysis and processing of data is a vital part of our modern society and requires vast amounts of computational resources. To reduce the computational burden, compressing and approximating data has become a central topic. We consider the approximation of labeled data samples, mathematically described as site-to-value maps between finite metric spaces. Within this setting, we identify the discrete modulus of continuity as an effective data-intrinsic quantity to measure regularity of site-to-value maps without imposing further structural assumptions. We investigate the consistency of the discrete modulus of continuity in the infinite data limit and propose an algorithm for its efficient computation. Building on these results, we present a sample based approximation theory for labeled data. For data subject to statistical uncertainty we consider multilevel approximation spaces and a variant of the multilevel Monte Carlo method to compute statistical quantities of interest. Our considerations connect approximation theory for labeled data in metric spaces to the covering problem for (random) balls on the one hand and the efficient evaluation of the discrete modulus of continuity to combinatorial optimization on the other hand. We provide extensive numerical studies to illustrate the feasibility of the approach and to validate our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes