LGSIMLMay 2, 2019

Network Representation Learning: Consolidation and Renewed Bearing

arXiv:1905.00987v224 citations
Originality Synthesis-oriented
AI Analysis

This work provides a comprehensive benchmark for researchers and practitioners in machine learning and data science, offering insights into method performance and guiding algorithm selection, though it is incremental as a survey rather than introducing new methods.

The authors conducted a large-scale experimental survey benchmarking 12 unsupervised network representation learning methods on 15 datasets for link prediction and node classification tasks, finding that certain baseline methods can compete when tuned, matrix factorization methods like MNMF and NetMF offer consistent advantages, and no single method outperforms others on both tasks.

Graphs are a natural abstraction for many problems where nodes represent entities and edges represent a relationship across entities. An important area of research that has emerged over the last decade is the use of graphs as a vehicle for non-linear dimensionality reduction in a manner akin to previous efforts based on manifold learning with uses for downstream database processing, machine learning and visualization. In this systematic yet comprehensive experimental survey, we benchmark several popular network representation learning methods operating on two key tasks: link prediction and node classification. We examine the performance of 12 unsupervised embedding methods on 15 datasets. To the best of our knowledge, the scale of our study -- both in terms of the number of methods and number of datasets -- is the largest to date. Our results reveal several key insights about work-to-date in this space. First, we find that certain baseline methods (task-specific heuristics, as well as classic manifold methods) that have often been dismissed or are not considered by previous efforts can compete on certain types of datasets if they are tuned appropriately. Second, we find that recent methods based on matrix factorization offer a small but relatively consistent advantage over alternative methods (e.g., random-walk based methods) from a qualitative standpoint. Specifically, we find that MNMF, a community preserving embedding method, is the most competitive method for the link prediction task. While NetMF is the most competitive baseline for node classification. Third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. We also present several drill-down analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context on performance) -- guiding the end-user.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes