SIDec 4, 2021
Mixed membership distribution-free modelHuan Qing, Jingli Wang
We consider the problem of community detection in overlapping weighted networks, where nodes can belong to multiple communities and edge weights can be finite real numbers. To model such complex networks, we propose a general framework - the mixed membership distribution-free (MMDF) model. MMDF has no distribution constraints of edge weights and can be viewed as generalizations of some previous models, including the well-known mixed membership stochastic blockmodels. Especially, overlapping signed networks with latent community structures can also be generated from our model. We use an efficient spectral algorithm with a theoretical guarantee of convergence rate to estimate community memberships under the model. We also propose the fuzzy weighted modularity to evaluate the quality of community detection for overlapping weighted networks with positive and negative edge weights. We then provide a method to determine the number of communities for weighted networks by taking advantage of our fuzzy weighted modularity. Numerical simulations and real data applications are carried out to demonstrate the usefulness of our mixed membership distribution-free model and our fuzzy weighted modularity.
MLSep 21, 2021
Community detection for weighted bipartite networksHuan Qing, Jingli Wang
The bipartite network appears in various areas, such as biology, sociology, physiology, and computer science. \cite{rohe2016co} proposed Stochastic co-Blockmodel (ScBM) as a tool for detecting community structure of binary bipartite graph data in network studies. However, ScBM completely ignores edge weight and is unable to explain the block structure of a weighted bipartite network. Here, to model a weighted bipartite network, we introduce a Bipartite Distribution-Free model by releasing ScBM's distribution restriction. We also build an extension of the proposed model by considering the variation of node degree. Our models do not require a specific distribution on generating elements of the adjacency matrix but only a block structure on the expected adjacency matrix. Spectral algorithms with theoretical guarantees on the consistent estimation of node labels are presented to identify communities. Our proposed methods are illustrated by simulated and empirical examples.
MLJan 7, 2021
Directed mixed membership stochastic blockmodelHuan Qing, Jingli Wang
Mixed membership problem for undirected network has been well studied in network analysis recent years. However, the more general case of mixed membership for directed network in which nodes can belong to multiple communities remains a challenge. Here, we propose an interpretable and identifiable model: directed mixed membership stochastic blockmodel (DiMMSB) for directed mixed membership networks. DiMMSB allows that row nodes and column nodes of the adjacency matrix can be different and these nodes may have distinct community structure in a directed network. We also develop an efficient spectral algorithm called DiSP designed based on simplex structures inherent in the left and right singular vectors of the population adjacency matrix to estimate the mixed memberships for both row nodes and column nodes in a directed network. We show that DiSP is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vectors of each row node and each column node using delicate spectral analysis. Numerical results on computer-generated directed mixed membership networks support our theoretical findings and show that our DiSP outperforms its competitor in both error rates and run-time. Applications of DiSP to real-world directed networks demonstrate the advantages of DiSP in studying the asymmetric structure of directed networks.
MLDec 17, 2020
Estimating Mixed-Memberships Using the Symmetric Laplacian Inverse MatrixHuan Qing, Jingli Wang
Mixed membership community detection is a challenging problem. In this paper, to detect mixed memberships, we propose a new method Mixed-SLIM which is a spectral clustering method on the symmetrized Laplacian inverse matrix under the degree-corrected mixed membership model. We provide theoretical bounds for the estimation error on the proposed algorithm and its regularized version under mild conditions. Meanwhile, we provide some extensions of the proposed method to deal with large networks in practice. These Mixed-SLIM methods outperform state-of-art methods in simulations and substantial empirical datasets for both community detection and mixed membership community detection problems.
SIDec 7, 2020
Mixed-SCORE+ for mixed membership community detectionHuan Qing, Jingli Wang
Mixed-SCORE is a recent approach for mixed membership community detection proposed by Jin et al. (2017) which is an extension of SCORE (Jin, 2015). In the note Jin et al. (2018), the authors propose SCORE+ as an improvement of SCORE to handle with weak signal networks. In this paper, we propose a method called Mixed-SCORE+ designed based on the Mixed-SCORE and SCORE+, therefore Mixed-SCORE+ inherits nice properties of both Mixed-SCORE and SCORE+. In the proposed method, we consider K+1 eigenvectors when there are K communities to detect weak signal networks. And we also construct vertices hunting and membership reconstruction steps to solve the problem of mixed membership community detection. Compared with several benchmark methods, numerical results show that Mixed-SCORE+ provides a significant improvement on the Polblogs network and two weak signal networks Simmons and Caltech, with error rates 54/1222, 125/1137 and 94/590, respectively. Furthermore, Mixed-SCORE+ enjoys excellent performances on the SNAP ego-networks.
SINov 23, 2020
Consistency of regularized spectral clustering in degree-corrected mixed membership modelHuan Qing, Jingli Wang
Community detection in network analysis is an attractive research area recently. Here, under the degree-corrected mixed membership (DCMM) model, we propose an efficient approach called mixed regularized spectral clustering (Mixed-RSC for short) based on the regularized Laplacian matrix. Mixed-RSC is designed based on an ideal cone structure of the variant for the eigen-decomposition of the population regularized Laplacian matrix. We show that the algorithm is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vector of each node. As a byproduct of our bound, we provide the theoretical optimal choice for the regularization parameter τ. To demonstrate the performance of our method, we apply it with previous benchmark methods on both simulated and real-world networks. To our knowledge, this is the first work to design spectral clustering algorithm for mixed membership community detection problem under DCMM model based on the application of regularized Laplacian matrix.
MLNov 12, 2020
An improved spectral clustering method for community detection under the degree-corrected stochastic blockmodelHuan Qing, Jingli Wang
For community detection problem, spectral clustering is a widely used method for detecting clusters in networks. In this paper, we propose an improved spectral clustering (ISC) approach under the degree corrected stochastic block model (DCSBM). ISC is designed based on the k-means clustering algorithm on the weighted leading K + 1 eigenvectors of a regularized Laplacian matrix where the weights are their corresponding eigenvalues. Theoretical analysis of ISC shows that under mild conditions the ISC yields stable consistent community detection. Numerical results show that ISC outperforms classical spectral clustering methods for community detection on both simulated and eight empirical networks. Especially, ISC provides a significant improvement on two weak signal networks Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.
MLNov 9, 2020
Dual regularized Laplacian spectral clustering methods on community detectionHuan Qing, Jingli Wang
Spectral clustering methods are widely used for detecting clusters in networks for community detection, while a small change on the graph Laplacian matrix could bring a dramatic improvement. In this paper, we propose a dual regularized graph Laplacian matrix and then employ it to three classical spectral clustering approaches under the degree-corrected stochastic block model. If the number of communities is known as $K$, we consider more than $K$ leading eigenvectors and weight them by their corresponding eigenvalues in the spectral clustering procedure to improve the performance. Three improved spectral clustering methods are dual regularized spectral clustering (DRSC) method, dual regularized spectral clustering on Ratios-of-eigenvectors (DRSCORE) method, and dual regularized symmetrized Laplacian inverse matrix (DRSLIM) method. Theoretical analysis of DRSC and DRSLIM show that under mild conditions DRSC and DRSLIM yield stable consistent community detection, moreover, DRSCORE returns perfect clustering under the ideal case. We compare the performances of DRSC, DRSCORE and DRSLIM with several spectral methods by substantial simulated networks and eight real-world networks.
MLNov 9, 2020
Community Detection by Principal Components Clustering MethodsHuan Qing, Jingli Wang
Based on the classical Degree Corrected Stochastic Blockmodel (DCSBM) model for network community detection problem, we propose two novel approaches: principal component clustering (PCC) and normalized principal component clustering (NPCC). Without any parameters to be estimated, the PCC method is simple to be implemented. Under mild conditions, we show that PCC yields consistent community detection. NPCC is designed based on the combination of the PCC and the RSC method (Qin & Rohe 2013). Population analysis for NPCC shows that NPCC returns perfect clustering for the ideal case under DCSBM. PCC and NPCC is illustrated through synthetic and real-world datasets. Numerical results show that NPCC provides a significant improvement compare with PCC and RSC. Moreover, NPCC inherits nice properties of PCC and RSC such that NPCC is insensitive to the number of eigenvectors to be clustered and the choosing of the tuning parameter. When dealing with two weak signal networks Simmons and Caltech, by considering one more eigenvectors for clustering, we provide two refinements PCC+ and NPCC+ of PCC and NPCC, respectively. Both two refinements algorithms provide improvement performances compared with their original algorithms. Especially, NPCC+ provides satisfactory performances on Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.