LGMLMar 28, 2017

Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization

arXiv:1703.09646v134 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for researchers and practitioners in data mining and machine learning, enabling better latent information discovery in datasets with multiple modalities like citation networks.

The paper tackled the problem of clustering data with both text content and connection structure by proposing a hybrid method that jointly optimizes Nonnegative Matrix Factorization (NMF) and Symmetric NMF (SymNMF) objectives, resulting in higher quality clustering compared to using content or structure alone.

We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes