LGAug 26, 2023

Large-scale gradient-based training of Mixtures of Factor Analyzers

arXiv:2308.13778v12.0h-index: 20

Originality Incremental advance

AI Analysis

This work addresses scalability and efficiency problems in probabilistic modeling for high-dimensional data, offering a practical solution for tasks such as image generation and outlier detection, though it is incremental in improving existing MFA methods.

The paper tackles the challenge of training Mixtures of Factor Analyzers (MFA) for high-dimensional data like images by developing a stochastic gradient descent method that simplifies training and avoids issues with batch algorithms like EM, achieving efficient training and inference without matrix inversions after completion.

Gaussian Mixture Models (GMMs) are a standard tool in data analysis. However, they face problems when applied to high-dimensional data (e.g., images) due to the size of the required full covariance matrices (CMs), whereas the use of diagonal or spherical CMs often imposes restrictions that are too severe. The Mixture of Factor analyzers (MFA) model is an important extension of GMMs, which allows to smoothly interpolate between diagonal and full CMs based on the number of \textit{factor loadings} $l$. MFA has successfully been applied for modeling high-dimensional image data. This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional MFA training by stochastic gradient descent, starting from random centroid initializations. This greatly simplifies the training and initialization process, and avoids problems of batch-type algorithms such Expectation-Maximization (EM) when training with huge amounts of data. In addition, by exploiting the properties of the matrix determinant lemma, we prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed. At training time, the methods requires the inversion of $l\times l$ matrices only. Besides the theoretical analysis and proofs, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.

View on arXiv PDF

Similar