LGCVMLOct 6, 2015

Large-scale subspace clustering using sketching and validation

arXiv:1510.01628v13 citations
Originality Incremental advance
AI Analysis

This work addresses the scalability problem in subspace clustering for big data applications, offering an incremental improvement over existing methods.

The paper tackles the high computational complexity of subspace clustering for large-scale data by introducing a randomized sketching and validation method (SkeVa-SC), which achieves competitive clustering accuracy with reduced computational burden, as shown in extensive tests on synthetic and real data.

The nowadays massive amounts of generated and communicated data present major challenges in their processing. While capable of successfully classifying nonlinearly separable objects in various settings, subspace clustering (SC) methods incur prohibitively high computational complexity when processing large-scale data. Inspired by the random sampling and consensus (RANSAC) approach to robust regression, the present paper introduces a randomized scheme for SC, termed sketching and validation (SkeVa-)SC, tailored for large-scale data. At the heart of SkeVa-SC lies a randomized scheme for approximating the underlying probability density function of the observed data by kernel smoothing arguments. Sparsity in data representations is also exploited to reduce the computational burden of SC, while achieving high clustering accuracy. Performance analysis as well as extensive numerical tests on synthetic and real data corroborate the potential of SkeVa-SC and its competitive performance relative to state-of-the-art scalable SC approaches. Keywords: Subspace clustering, big data, kernel smoothing, randomization, sketching, validation, sparsity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes