SIJan 8, 2023
Community Detection with Known, Unknown, or Partially Known Auxiliary Latent VariablesMohammad Esmaeili, Aria Nosratinia
Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this paper, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study community detection in graphs obeying the stochastic block model and censored block model with auxiliary latent variables. We analyze the conditions for exact recovery when these auxiliary latent variables are unknown, representing unknown nuisance parameters or model mismatch. We also analyze exact recovery when these secondary latent variables have been either fully or partially revealed. Finally, we propose a semidefinite programming algorithm for recovering the desired labels when the secondary labels are either known or unknown. We show that exact recovery is possible by semidefinite programming down to the respective maximum likelihood exact recovery threshold.
MLMay 6, 2021
Semidefinite Programming for Community Detection with Side InformationMohammad Esmaeili, Hussein Metwaly Saad, Aria Nosratinia
This paper produces an efficient Semidefinite Programming (SDP) solution for community detection that incorporates non-graph data, which in this context is known as side information. SDP is an efficient solution for standard community detection on graphs. We formulate a semi-definite relaxation for the maximum likelihood estimation of node labels, subject to observing both graph and non-graph data. This formulation is distinct from the SDP solution of standard community detection, but maintains its desirable properties. We calculate the exact recovery threshold for three types of non-graph information, which in this paper are called side information: partially revealed labels, noisy labels, as well as multiple observations (features) per node with arbitrary but finite cardinality. We find that SDP has the same exact recovery threshold in the presence of side information as maximum likelihood with side information. Thus, the methods developed herein are computationally efficient as well as asymptotically accurate for the solution of community detection in the presence of side information. Simulations show that the asymptotic results of this paper can also shed light on the performance of SDP for graphs of modest size.
SIFeb 8, 2021
Community Detection: Exact Recovery in Weighted GraphsMohammad Esmaeili, Aria Nosratinia
In community detection, the exact recovery of communities (clusters) has been mainly investigated under the general stochastic block model with edges drawn from Bernoulli distributions. This paper considers the exact recovery of communities in a complete graph in which the graph edges are drawn from either a set of Gaussian distributions with community-dependent means and variances, or a set of exponential distributions with community-dependent means. For each case, we introduce a new semi-metric that describes sufficient and necessary conditions of exact recovery. The necessary and sufficient conditions are asymptotically tight. The analysis is also extended to incomplete, fully connected weighted graphs.
LGSep 29, 2020
Semi-Supervised Node Classification by Graph Convolutional Networks and Extracted Side InformationMohammad Esmaeili, Aria Nosratinia
The nodes of a graph existing in a cluster are more likely to connect to each other than with other nodes in the graph. Then revealing some information about some nodes, the structure of the graph (graph edges) provides this opportunity to know more information about other nodes. From this perspective, this paper revisits the node classification task in a semi-supervised scenario by graph convolutional networks (GCNs). The goal is to benefit from the flow of information that circulates around the revealed node labels. The contribution of this paper is twofold. First, this paper provides a method for extracting side information from a graph realization. Then a new GCN architecture is presented that combines the output of traditional GCN and the extracted side information. Another contribution of this paper is relevant to non-graph observations (independent side information) that exists beside a graph realization in many applications. Indeed, the extracted side information can be replaced by a sequence of side information that is independent of the graph structure. For both cases, the experiments on synthetic and real-world datasets demonstrate that the proposed model achieves a higher prediction accuracy in comparison to the existing state-of-the-art methods for the node classification task.
SIJan 8, 2019
EXIT Analysis for Community DetectionHussein Saad, Aria Nosratinia
This paper employs the extrinsic information transfer (EXIT) method, a technique imported from the analysis of the iterative decoding of error control codes, to study the performance of belief propagation in community detection in the presence of side information. We consider both the detection of a single (hidden) community, as well as the problem of identifying two symmetric communities. For single community detection, this paper demonstrates the suitability of EXIT to predict the asymptotic phase transition for weak recovery. More importantly, EXIT analysis is leveraged to produce useful insights such as the performance of belief propagation near the threshold. For two symmetric communities, the asymptotic residual error for belief propagation is calculated under finite-alphabet side information, generalizing a previous result with noisy labels. EXIT analysis is used to illuminate the effect of side information on community detection, its relative importance depending on the correlation of the graphical information with node labels, as well as the effect of side information on residual errors.
ITSep 5, 2018
Recovering a Single Community with Side InformationHussein Saad, Aria Nosratinia
We study the effect of the quality and quantity of side information on the recovery of a hidden community of size $K=o(n)$ in a graph of size $n$. Side information for each node in the graph is modeled by a random vector with the following features: either the dimension of the vector is allowed to vary with $n$, while log-likelihood ratio (LLR) of each component with respect to the node label is fixed, or the LLR is allowed to vary and the vector dimension is fixed. These two models represent the variation in quality and quantity of side information. Under maximum likelihood detection, we calculate tight necessary and sufficient conditions for exact recovery of the labels. We demonstrate how side information needs to evolve with $n$ in terms of either its quantity, or quality, to improve the exact recovery threshold. A similar set of results are obtained for weak recovery. Under belief propagation, tight necessary and sufficient conditions for weak recovery are calculated when the LLRs are constant, and sufficient conditions when the LLRs vary with $n$. Moreover, we design and analyze a local voting procedure using side information that can achieve exact recovery when applied after belief propagation. The results for belief propagation are validated via simulations on finite synthetic data-sets, showing that the asymptotic results of this paper can also shed light on the performance at finite $n$.