STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices
For operators of large-scale microservice systems, STLGT provides accurate and efficient tail-latency forecasting for proactive SLO management.
STLGT predicts p95 tail latency in microservices using a linear graph Transformer on span graphs, achieving 8.5% lower MAPE than PERT-GNN and up to 12x faster CPU inference on Alibaba traces.
Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.