ML DIS-NN CV LGFeb 15, 2018

Natural data structure extracted from neighborhood-similarity graphs

Tom Lorimer, Karlis Kanders, Ruedi Stoop

arXiv:1803.00500v13 citations

Originality Incremental advance

AI Analysis

This addresses the need for unbiased structural analysis in high-dimensional data analysis, though it appears incremental as it builds on graph-based methods without a paradigm shift.

The authors tackled the problem of distortion and bias in dimensionality reduction and clustering methods by proposing a non-iterative framework that encodes neighborhood similarities as a sparse graph, enabling transparent interpretation without altering data dimensions or metrics.

'Big' high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality-reduction step that inherently distorts the data structure. For the same purpose, clustering methods are also often used. These methods also introduce a bias, either by starting from the assumption of a particular geometric form of the clusters, or by using iterative schemes to enhance cluster contours, with uncontrollable consequences. The goal of data analysis should, however, be to encode and detect structural data features at all scales and densities simultaneously, without assuming a parametric form of data point distances, or modifying them. We propose a novel approach that directly encodes data point neighborhood similarities as a sparse graph. Our non-iterative framework permits a transparent interpretation of data, without altering the original data dimension and metric. Several natural and synthetic data applications demonstrate the efficacy of our novel approach.

View on arXiv PDF

Similar