LGMLFeb 14, 2018

Robust Continuous Co-Clustering

arXiv:1802.05036v11 citations
Originality Incremental advance
AI Analysis

It provides a general-purpose co-clustering tool for cross-domain practitioners, including non-experts, by addressing scalability and robustness in real-world datasets.

The paper tackles the problem of co-clustering, which groups samples and features simultaneously, by introducing ROCCO, a robust and scalable algorithm that achieves state-of-the-art performance in domains like Biomedicine and Text Mining, with linear empirical scalability and low sensitivity to hyperparameters.

Clustering consists of grouping together samples giving their similar properties. The problem of modeling simultaneously groups of samples and features is known as Co-Clustering. This paper introduces ROCCO - a Robust Continuous Co-Clustering algorithm. ROCCO is a scalable, hyperparameter-free, easy and ready to use algorithm to address Co-Clustering problems in practice over massive cross-domain datasets. It operates by learning a graph-based two-sided representation of the input matrix. The underlying proposed optimization problem is non-convex, which assures a flexible pool of solutions. Moreover, we prove that it can be solved with a near linear time complexity on the input size. An exhaustive large-scale experimental testbed conducted with both synthetic and real-world datasets demonstrates ROCCO's properties in practice: (i) State-of-the-art performance in cross-domain real-world problems including Biomedicine and Text Mining; (ii) very low sensitivity to hyperparameter settings; (iii) robustness to noise and (iv) a linear empirical scalability in practice. These results highlight ROCCO as a powerful general-purpose co-clustering algorithm for cross-domain practitioners, regardless of their technical background.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes