QM LGOct 27, 2016

A random version of principal component analysis in data clustering

arXiv:1610.08664v11.2

Originality Synthesis-oriented

AI Analysis

This addresses a limitation in data clustering for researchers dealing with high-dimensional data, but appears incremental as it modifies an existing method.

The paper tackled the problem of PCA's mathematical constraints on sample size in high-dimensional data by introducing a modified algorithm that works on both well-dimensioned and degenerated datasets, achieving functionality without specific numerical results.

Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical constraints on the minimum number of different replicates or samples that must be included in the analysis. Here we show that a modified algorithm works not only on well dimensioned datasets, but also on degenerated ones.

View on arXiv PDF

Similar