MLOct 1, 2013

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Jiaming Xu, Rui Wu, Kai Zhu, Bruce Hajek, R. Srikant, Lei Ying

arXiv:1310.0512v245 citations

Originality Incremental advance

AI Analysis

This addresses a clustering problem for binary data in applications like recommendation systems, but it is incremental as it builds on existing matrix clustering methods.

The paper tackles the problem of exactly recovering row and column clusters from noisy binary matrices with limited observations, deriving a lower bound on observations needed and proposing three algorithms that show smooth trade-offs between computational complexity and observation requirements.

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we propose three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.

View on arXiv PDF

Similar