Splitting Methods for Convex Bi-Clustering and Co-Clustering
This work addresses computational bottlenecks for researchers and practitioners using convex co-clustering methods, though it is incremental as it focuses on improving existing algorithms.
The paper tackles the computational inefficiency of convex formulations for bi-clustering and co-clustering by proposing three operator-splitting methods, with the Generalized ADMM showing far greater efficiency for large problems.
Co-Clustering, the problem of simultaneously identifying clusters across multiple aspects of a data set, is a natural generalization of clustering to higher-order structured data. Recent convex formulations of bi-clustering and tensor co-clustering, which shrink estimated centroids together using a convex fusion penalty, allow for global optimality guarantees and precise theoretical analysis, but their computational properties have been less well studied. In this note, we present three efficient operator-splitting methods for the convex co-clustering problem: a standard two-block ADMM, a Generalized ADMM which avoids an expensive tensor Sylvester equation in the primal update, and a three-block ADMM based on the operator splitting scheme of Davis and Yin. Theoretical complexity analysis suggests, and experimental evidence confirms, that the Generalized ADMM is far more efficient for large problems.