MLFeb 14, 2018

Vertex nomination: The canonical sampling and the extended spectral nomination schemes

Jordan Yoder, Li Chen, Henry Pao, Eric Bridgeford, Keith Levin, Donniell Fishkind, Carey Priebe, Vince Lyzinski

arXiv:1802.04960v27.819 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently identifying interesting vertices in networks with limited labeled data, which is incremental by building on existing schemes to enhance scalability and precision.

The paper tackles the vertex nomination problem in stochastic block models where only a few block labels are observed, aiming to rank unlabeled vertices to place interesting ones near the top of the list. It introduces two scalable approximations: the canonical sampling nomination scheme (L^CS) based on MCMC to approximate the optimal but intractable canonical scheme, and the extended spectral partitioning nomination scheme (L^EP) that improves precision through a novel semisupervised clustering framework, with experiments showing their effectiveness and computational complexity.

Suppose that one particular block in a stochastic block model is of interest, but block labels are only observed for a few of the vertices in the network. Utilizing a graph realized from the model and the observed block labels, the vertex nomination task is to order the vertices with unobserved block labels into a ranked nomination list with the goal of having an abundance of interesting vertices near the top of the list. There are vertex nomination schemes in the literature, including the optimally precise canonical nomination scheme~$\mathcal{L}^C$ and the consistent spectral partitioning nomination scheme~$\mathcal{L}^P$. While the canonical nomination scheme $\mathcal{L}^C$ is provably optimally precise, it is computationally intractable, being impractical to implement even on modestly sized graphs. With this in mind, an approximation of the canonical scheme---denoted the {\it canonical sampling nomination scheme} $\mathcal{L}^{CS}$---is introduced; $\mathcal{L}^{CS}$ relies on a scalable, Markov chain Monte Carlo-based approximation of $\mathcal{L}^{C}$, and converges to $\mathcal{L}^{C}$ as the amount of sampling goes to infinity. The spectral partitioning nomination scheme is also extended to the {\it extended spectral partitioning nomination scheme}, $\mathcal{L}^{EP}$, which introduces a novel semisupervised clustering framework to improve upon the precision of $\mathcal{L}^P$. Real-data and simulation experiments are employed to illustrate the precision of these vertex nomination schemes, as well as their empirical computational complexity. Keywords: vertex nomination, Markov chain Monte Carlo, spectral partitioning, Mclust MSC[2010]: 60J22, 65C40, 62H30, 62H25

View on arXiv PDF

Similar