DC LGJan 1

Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving

Amey Agrawal, Mayank Yadav, Sukrit Kumar, Anirudha Agrawal, Garv Ghai, Souradeep Bera, Elton Pinto, Sirish Gambhira, Mohammad Adain, Kasra Sohrab, Chus Antonanzas, Alexey Tumanov

Georgia Tech

arXiv:2601.00397v15.14 citationsh-index: 8

Originality Highly original

AI Analysis

This addresses the high cost and time burden of evaluating LLM serving configurations on GPU clusters, offering a faster and cheaper alternative for developers and researchers.

The paper tackled the problem of efficiently testing LLM serving configurations by introducing Revati, a time-warp emulator that directly executes real serving system code without physical GPUs, achieving less than 5% prediction error and running 5-17x faster than real GPU execution.

Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve. We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5% prediction error across multiple models and parallelism configurations, while running 5-17x faster than real GPU execution.

View on arXiv PDF

Similar