ML LG STJun 18, 2018

Overlapping Clustering Models, and One (class) SVM to Bind Them All

Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti

arXiv:1806.06945v211.218 citations

Originality Synthesis-oriented

AI Analysis

This offers a scalable and theoretically grounded solution for overlapping clustering, which is incremental as it applies an existing method (SVM) to known models.

The paper tackles the problem of overlapping clustering in data like social networks and text by showing that a simple one-class SVM provides provably consistent parameter inference for a broad class of existing models, with experimental results demonstrating accuracy and scalability on simulated and real datasets.

People belong to multiple communities, words belong to multiple topics, and books cover multiple genres; overlapping clusters are commonplace. Many existing overlapping clustering methods model each person (or word, or book) as a non-negative weighted combination of "exemplars" who belong solely to one community, with some small noise. Geometrically, each person is a point on a cone whose corners are these exemplars. This basic form encompasses the widely used Mixed Membership Stochastic Blockmodel of networks (Airoldi et al., 2008) and its degree-corrected variants (Jin et al., 2017), as well as topic models such as LDA (Blei et al., 2003). We show that a simple one-class SVM yields provably consistent parameter inference for all such models, and scales to large datasets. Experimental results on several simulated and real datasets show our algorithm (called SVM-cone) is both accurate and scalable.

View on arXiv PDF

Similar