MLNov 3, 2015

PCA-Based Out-of-Sample Extension for Dimensionality Reduction

arXiv:1511.00831v124 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck for researchers and practitioners handling large-scale data analysis, though it is incremental as it builds on existing extension methods like Nyström.

The paper tackles the computational expense of applying dimensionality reduction to massive or accumulating data by proposing an out-of-sample extension scheme that uses PCA and intrinsic data geometry, proving its error is bounded and providing abnormality detection for new data points.

Dimensionality reduction methods are very common in the field of high dimensional data analysis. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications for the analysis of massive amounts of data are impractical. For example, repeated computations due to accumulated data are computationally prohibitive. In this paper, an out-of-sample extension scheme, which is used as a complementary method for dimensionality reduction, is presented. We describe an algorithm which performs an out-of-sample extension to newly-arrived data points. Unlike other extension algorithms such as Nyström algorithm, the proposed algorithm uses the intrinsic geometry of the data and properties for dimensionality reduction map. We prove that the error of the proposed algorithm is bounded. Additionally to the out-of-sample extension, the algorithm provides a degree of the abnormality of any newly-arrived data point.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes