LGAIDCSIMLMar 28, 2019

PyTorch-BigGraph: A Large-scale Graph Embedding System

arXiv:1903.12287v3411 citations
Originality Incremental advance
AI Analysis

This addresses the scalability bottleneck for graph embeddings in large-scale industrial applications, though it is incremental as it builds on traditional multi-relation embedding systems with modifications for scaling.

The paper tackles the problem of scaling graph embedding methods to industrial-scale graphs with billions of nodes and trillions of edges, presenting PyTorch-BigGraph (PBG) which achieves comparable performance on benchmarks while enabling training on graphs like Freebase with over 100 million nodes and 2 billion edges.

Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes