LGAIApr 29

STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

arXiv:2604.2642213.5
AI Analysis

For operators of large-scale microservice systems, STLGT provides accurate and efficient tail-latency forecasting for proactive SLO management.

STLGT predicts p95 tail latency in microservices using a linear graph Transformer on span graphs, achieving 8.5% lower MAPE than PERT-GNN and up to 12x faster CPU inference on Alibaba traces.

Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes