From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs
This work addresses the need for real-time, low-latency inference on continuous-time dynamic graphs, which is crucial for applications like social networks or recommendation systems, though it is incremental as it builds on existing random-walk methods.
The paper tackled the problem of high computational cost and latency in dynamic graph representation learning by proposing graph-sprints, a low-latency framework that approximates random-walk features using single-hop operations, achieving competitive performance and nearly 10x speed-up in inference on node classification tasks across five datasets.
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.