DCLGJan 1

Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving

Georgia Tech
arXiv:2601.00397v14 citationsh-index: 8
Originality Highly original
AI Analysis

This addresses the high cost and time burden of evaluating LLM serving configurations on GPU clusters, offering a faster and cheaper alternative for developers and researchers.

The paper tackled the problem of efficiently testing LLM serving configurations by introducing Revati, a time-warp emulator that directly executes real serving system code without physical GPUs, achieving less than 5% prediction error and running 5-17x faster than real GPU execution.

Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve. We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5% prediction error across multiple models and parallelism configurations, while running 5-17x faster than real GPU execution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes