Consistent Estimation of Low-Dimensional Latent Structure in High-Dimensional Data
This provides a method for dimension reduction and latent variable estimation in high-dimensional datasets, such as genomics, but appears incremental as it builds on existing linear latent structure frameworks.
The paper tackles the problem of extracting low-dimensional linear latent structure from high-dimensional data, showing that under mild conditions, this structure can be consistently recovered using only second moments, with verification via simulations and genomic data.
We consider the problem of extracting a low-dimensional, linear latent variable structure from high-dimensional random variables. Specifically, we show that under mild conditions and when this structure manifests itself as a linear space that spans the conditional means, it is possible to consistently recover the structure using only information up to the second moments of these random variables. This finding, specialized to one-parameter exponential families whose variance function is quadratic in their means, allows for the derivation of an explicit estimator of such latent structure. This approach serves as a latent variable model estimator and as a tool for dimension reduction for a high-dimensional matrix of data composed of many related variables. Our theoretical results are verified by simulation studies and an application to genomic data.