MLJan 9, 2013

On the Incommensurability Phenomenon

arXiv:1301.1954v56 citations
Originality Synthesis-oriented
AI Analysis

This addresses a problem for data analysts using PCA on noisy datasets, but it is incremental as it builds on existing statistical theory.

The paper quantifies the 'incommensurability phenomenon,' where two datasets from the same process yield large fitting errors after PCA, showing the error is a convex combination of subspace distance and maximum error under specified conditions.

Suppose that two large, multi-dimensional data sets are each noisy measurements of the same underlying random process, and principle components analysis is performed separately on the data sets to reduce their dimensionality. In some circumstances it may happen that the two lower-dimensional data sets have an inordinately large Procrustean fitting-error between them. The purpose of this manuscript is to quantify this "incommensurability phenomenon." In particular, under specified conditions, the square Procrustean fitting-error of the two normalized lower-dimensional data sets is (asymptotically) a convex combination (via a correlation parameter) of the Hausdorff distance between the projection subspaces and the maximum possible value of the square Procrustean fitting-error for normalized data. We show how this gives rise to the incommensurability phenomenon, and we employ illustrative simulations as well as a real data experiment to explore how the incommensurability phenomenon may have an appreciable impact.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes