Identifying Outliers in Large Matrices via Randomized Adaptive Compressive Sampling
This addresses outlier detection in large datasets for applications like robust collaborative filtering and computer vision, though it appears incremental as an adaptive method for a known bottleneck.
The paper tackles the problem of locating outlier columns in large, low-rank matrices by proposing a two-step adaptive sensing and inference approach, achieving accurate identification with as few as the squared rank plus the number of outliers times constant and logarithmic factors of linear summaries.
This paper examines the problem of locating outlier columns in a large, otherwise low-rank, matrix. We propose a simple two-step adaptive sensing and inference approach and establish theoretical guarantees for its performance; our results show that accurate outlier identification is achievable using very few linear summaries of the original data matrix -- as few as the squared rank of the low-rank component plus the number of outliers, times constant and logarithmic factors. We demonstrate the performance of our approach experimentally in two stylized applications, one motivated by robust collaborative filtering tasks, and the other by saliency map estimation tasks arising in computer vision and automated surveillance, and also investigate extensions to settings where the data are noisy, or possibly incomplete.