Community Detection with Known, Unknown, or Partially Known Auxiliary Latent Variables
This addresses the challenge of improving community detection accuracy in networks with nuisance parameters, representing an incremental advance in graph analysis methods.
The paper tackles the problem of community detection in graphs where residual dependencies beyond community membership are modeled by auxiliary latent variables, analyzing exact recovery conditions and proposing a semidefinite programming algorithm that achieves recovery down to the maximum likelihood threshold.
Empirical observations suggest that in practice, community membership does not completely explain the dependency between the edges of an observation graph. The residual dependence of the graph edges are modeled in this paper, to first order, by auxiliary node latent variables that affect the statistics of the graph edges but carry no information about the communities of interest. We then study community detection in graphs obeying the stochastic block model and censored block model with auxiliary latent variables. We analyze the conditions for exact recovery when these auxiliary latent variables are unknown, representing unknown nuisance parameters or model mismatch. We also analyze exact recovery when these secondary latent variables have been either fully or partially revealed. Finally, we propose a semidefinite programming algorithm for recovering the desired labels when the secondary labels are either known or unknown. We show that exact recovery is possible by semidefinite programming down to the respective maximum likelihood exact recovery threshold.