On Robust Probabilistic Principal Component Analysis using Multivariate $t$-Distributions
This work resolves a theoretical inconsistency in robust PPCA methods, which is important for researchers in statistics and machine learning dealing with outlier-prone data, though it is incremental as it builds on existing t-distribution approaches.
The paper addresses a misrepresentation in robust probabilistic principal component analysis (PPCA) by clarifying the equivalence between multivariate t-distribution frameworks and hierarchical models, and proposes a Monte Carlo expectation-maximization algorithm for implementation, with simulation studies showing improved robustness in handling outliers.
Probabilistic principal component analysis (PPCA) is a probabilistic reformulation of principal component analysis (PCA), under the framework of a Gaussian latent variable model. To improve the robustness of PPCA, it has been proposed to change the underlying Gaussian distributions to multivariate $t$-distributions. Based on the representation of $t$-distribution as a scale mixture of Gaussian distributions, a hierarchical model is used for implementation. However, in the existing literature, the hierarchical model implemented does not yield the equivalent interpretation. In this paper, we present two sets of equivalent relationships between the high-level multivariate $t$-PPCA framework and the hierarchical model used for implementation. In doing so, we clarify a current misrepresentation in the literature, by specifying the correct correspondence. In addition, we discuss the performance of different multivariate $t$ robust PPCA methods both in theory and simulation studies, and propose a novel Monte Carlo expectation-maximization (MCEM) algorithm to implement one general type of such models.