DSLGMLOct 6, 2016

Efficient L1-Norm Principal-Component Analysis via Bit Flipping

arXiv:1610.01959v1132 citations
Originality Incremental advance
AI Analysis

This work addresses computational efficiency for robust PCA in big data applications, offering a practical solution for tasks like fault detection and medical diagnosis, though it is incremental as it builds on prior exact methods.

The paper tackles the prohibitive computational cost of exact L1-norm principal component analysis (L1-PCA) for large datasets by proposing a novel suboptimal algorithm with cost comparable to standard L2-PCA. The algorithm calculates exact optimal L1-PCs with high frequency, achieving higher L1-PC optimization metric than alternatives, and demonstrates superiority over L2-PCs in applications like dimensionality reduction and disease diagnosis from genomic data.

It was shown recently that the $K$ L1-norm principal components (L1-PCs) of a real-valued data matrix $\mathbf X \in \mathbb R^{D \times N}$ ($N$ data samples of $D$ dimensions) can be exactly calculated with cost $\mathcal{O}(2^{NK})$ or, when advantageous, $\mathcal{O}(N^{dK - K + 1})$ where $d=\mathrm{rank}(\mathbf X)$, $K<d$ [1],[2]. In applications where $\mathbf X$ is large (e.g., "big" data of large $N$ and/or "heavy" data of large $d$), these costs are prohibitive. In this work, we present a novel suboptimal algorithm for the calculation of the $K < d$ L1-PCs of $\mathbf X$ of cost $\mathcal O(ND \mathrm{min} \{ N,D\} + N^2(K^4 + dK^2) + dNK^3)$, which is comparable to that of standard (L2-norm) PC analysis. Our theoretical and experimental studies show that the proposed algorithm calculates the exact optimal L1-PCs with high frequency and achieves higher value in the L1-PC optimization metric than any known alternative algorithm of comparable computational cost. The superiority of the calculated L1-PCs over standard L2-PCs (singular vectors) in characterizing potentially faulty data/measurements is demonstrated with experiments on data dimensionality reduction and disease diagnosis from genomic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes