Active covariance estimation by random sub-sampling of variables
This addresses covariance estimation for high-dimensional data with partial observations, which is incremental as it builds on existing methods by optimizing sub-sampling in an active framework.
The paper tackles covariance matrix estimation when only subsets of vector coordinates are observed per sample, using an unbiased estimator with error bounds linking sub-sampling probabilities to covariance entries. It applies this in an active learning setting with limited observations, proposing optimal sub-sampling probabilities and an algorithm for estimation.
We study covariance matrix estimation for the case of partially observed random vectors, where different samples contain different subsets of vector coordinates. Each observation is the product of the variable of interest with a $0-1$ Bernoulli random variable. We analyze an unbiased covariance estimator under this model, and derive an error bound that reveals relations between the sub-sampling probabilities and the entries of the covariance matrix. We apply our analysis in an active learning framework, where the expected number of observed variables is small compared to the dimension of the vector of interest, and propose a design of optimal sub-sampling probabilities and an active covariance matrix estimation algorithm.