SILGFeb 27, 2020

DSSLP: A Distributed Framework for Semi-supervised Link Prediction

arXiv:2002.12056v24 citations
AI Analysis

This work addresses scalability issues for link prediction in industrial applications like recommendation and fraud detection, but it is incremental as it builds on existing distributed and sampling techniques.

The authors tackled the challenge of training link prediction models on industrial-scale graphs with billions of nodes and edges by developing DSSLP, a distributed framework that uses k-hops neighborhood training and adaptive sampling, achieving effectiveness and efficiency on public and real-world datasets.

Link prediction is widely used in a variety of industrial applications, such as merchant recommendation, fraudulent transaction detection, and so on. However, it's a great challenge to train and deploy a link prediction model on industrial-scale graphs with billions of nodes and edges. In this work, we present a scalable and distributed framework for semi-supervised link prediction problem (named DSSLP), which is able to handle industrial-scale graphs. Instead of training model on the whole graph, DSSLP is proposed to train on the \emph{$k$-hops neighborhood} of nodes in a mini-batch setting, which helps reduce the scale of the input graph and distribute the training procedure. In order to generate negative examples effectively, DSSLP contains a distributed batched runtime sampling module. It implements uniform and dynamic sampling approaches, and is able to adaptively construct positive and negative examples to guide the training process. Moreover, DSSLP proposes a model-split strategy to accelerate the speed of inference process of the link prediction task. Experimental results demonstrate that the effectiveness and efficiency of DSSLP in serval public datasets as well as real-world datasets of industrial-scale graphs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes