StreamTGN: A GPU-Efficient Serving System for Streaming Temporal Graph Neural Networks
This work addresses a critical bottleneck for real-time applications of dynamic graph learning, such as social network analysis or recommendation systems, by enabling efficient, low-latency inference.
The paper tackles the inefficiency of existing Temporal Graph Neural Network (TGN) inference systems, which update all node embeddings for each new edge, by introducing StreamTGN, a streaming inference system that exploits locality to update only affected nodes, achieving speedups of 4.5x to 739x on various graphs with no accuracy loss.
Temporal Graph Neural Networks (TGNs) achieve state-of-the-art performance on dynamic graph tasks, yet existing systems focus exclusively on accelerating training -- at inference time, every new edge triggers $O(|V|)$ embedding updates even though only a small fraction of nodes are affected. We present \textbf{StreamTGN}, the first streaming TGN inference system exploiting the inherent locality of temporal graph updates: in an $L$-layer TGN, a new edge affects only nodes within $L$ hops of the endpoints, typically less than 0.2\% on million-node graphs. StreamTGN maintains persistent GPU-resident node memory and uses dirty-flag propagation to identify the affected set $\mathcal{A}$, reducing per-batch complexity from $O(|V|)$ to $O(|\mathcal{A}|)$ with zero accuracy loss. Drift-aware adaptive rebuild scheduling and batched streaming with relaxed ordering further maximize throughput. Experiments on eight temporal graphs (2K--2.6M nodes) show 4.5$\times$--739$\times$ speedup for TGN and up to 4,207$\times$ for TGAT, with identical accuracy. StreamTGN is orthogonal to training optimizations: combining SWIFT with StreamTGN yields 24$\times$ end-to-end speedup across three architectures (TGN, TGAT, DySAT).