Effective Data-aware Covariance Estimator from Compressed Data
This work addresses covariance estimation for applications handling large-scale data, but it appears incremental as it builds on existing compression-based methods with a weighted sampling approach.
The paper tackles the problem of estimating covariance matrices from massive high-dimensional distributed data by proposing DACE, a data-aware weighted sampling estimator that provides unbiased and more accurate estimation under the same compression ratio, with experiments showing superior performance on synthetic and real-world datasets.
Estimating covariance matrix from massive high-dimensional and distributed data is significant for various real-world applications. In this paper, we propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation and attain more accurate estimation under the same compression ratio. Moreover, we extend our proposed DACE to tackle multiclass classification problems with theoretical justification and conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.