Denoising data reduction algorithm for Topological Data Analysis

Seonmi Choi, Semin Oh, Jeong Rye Park, Seung Yeop Yang

arXiv:2603.2924863.3h-index: 4

AI Analysis

This addresses computational and noise challenges in topological data analysis for researchers handling large datasets, though it appears incremental as a refinement of grid-based methods.

The paper tackles the problem of applying persistent homology to large and noisy datasets by proposing the Refined Characteristic Lattice Algorithm (RCLA), which integrates data reduction and denoising to remove noise while preserving essential structure, and demonstrates consistent outperformance over existing methods in experiments.

Persistent homology is a central tool in topological data analysis, but its application to large and noisy datasets is often limited by computational cost and the presence of spurious topological features. Noise not only increases data size but also obscures the underlying structure of the data. In this paper, we propose the Refined Characteristic Lattice Algorithm (RCLA), a grid-based method that integrates data reduction with threshold-based denoising in a single procedure. By incorporating a threshold parameter $k$, RCLA removes noise while preserving the essential structure of the data in a single pass. We further provide a theoretical guarantee by proving a stability theorem under a homogeneous Poisson noise model, which bounds the bottleneck distance between the persistence diagrams of the output and the underlying shape with high probability. In addition, we introduce an automatic parameter selection method based on nearest-neighbor statistics. Experimental results demonstrate that RCLA consistently outperforms existing methods, and its effectiveness is further validated on a 3D shape classification task.

View on arXiv PDF

Similar