LG MLMay 20, 2020

Understanding Negative Sampling in Graph Representation Learning

Zhen Yang, Ming Ding, Chang Zhou, Hongxia Yang, Jingren Zhou, Jie Tang

arXiv:2005.09863v225.4220 citationsHas Code

Originality Highly original

AI Analysis

This work addresses a critical bottleneck in graph representation learning for tasks like link prediction and node classification, offering a novel theoretical and practical contribution.

The paper tackles the insufficient exploration of negative sampling in graph representation learning by theoretically demonstrating its importance and proposing MCNS, a method that approximates the positive distribution and accelerates sampling, achieving robust and superior results across 5 datasets and 19 experimental settings.

Graph representation learning has been extensively studied in recent years. Despite its potential in generating continuous embeddings for various networks, both the effectiveness and efficiency to infer high-quality representations toward large corpus of nodes are still challenging. Sampling is a critical point to achieve the performance goals. Prior arts usually focus on sampling positive node pairs, while the strategy for negative sampling is left insufficiently explored. To bridge the gap, we systematically analyze the role of negative sampling from the perspectives of both objective and risk, theoretically demonstrating that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance. To the best of our knowledge, we are the first to derive the theory and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation, on a total of 19 experimental settings. These relatively comprehensive experimental results demonstrate its robustness and superiorities.

View on arXiv PDF Code

Similar