Improving Skip-Gram based Graph Embeddings via Centrality-Weighted Sampling
This work addresses a specific bottleneck in graph embedding methods for researchers and practitioners, offering incremental improvements in efficiency and accuracy.
The paper tackled the problem of how node sampling distributions affect graph embedding performance, showing that using centrality-weighted sampling improves learning speeds by up to 2 times and increases accuracy in node classification tasks.
Network embedding techniques inspired by word2vec represent an effective unsupervised relational learning model. Commonly, by means of a Skip-Gram procedure, these techniques learn low dimensional vector representations of the nodes in a graph by sampling node-context examples. Although many ways of sampling the context of a node have been proposed, the effects of the way a node is chosen have not been analyzed in depth. To fill this gap, we have re-implemented the main four word2vec inspired graph embedding techniques under the same framework and analyzed how different sampling distributions affects embeddings performance when tested in node classification problems. We present a set of experiments on different well known real data sets that show how the use of popular centrality distributions in sampling leads to improvements, obtaining speeds of up to 2 times in learning times and increasing accuracy in all cases.