Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

Mengtian Yang, Zhekun Zhang, Mingheng Wu, Jianwen Yan, Hanshi Sun, Li-wen Chang

arXiv:2605.1716429.9

Predicted impact top 54% in DC · last 90 daysOriginality Incremental advance

AI Analysis

For system engineers and researchers deploying large-scale LLMs, Charon provides accurate performance simulation to guide optimization and system studies.

Charon is a unified simulator for predicting LLM training and inference performance, achieving prediction errors under 5.35% overall and under 3.74% for large-scale training, and it discovered a configuration that improved throughput over an engineering-tuned baseline.

Deploying large-scale LLM training and inference with optimal performance is exceptionally challenging due to a complex design space of parallelism strategies, system optimizations, and hardware configurations. Accurate and rapid performance simulation is critical for guiding optimization efforts and system studies by validating "what-if" Hooker Figure hypotheses. To address this, we introduce Charon, a unified, modular, and fine-grained simulator for accurately predicting LLM performance. Experiments show Charon achieves high accuracy across different models and configurations, with an overall prediction error consistently under 5.35%, and even under 3.74% for training with a large-scale GPU cluster. In a practical inference deployment case, Charon discovered a configuration that improved system throughput over an engineering-tuned baseline, demonstrating its significant real-world value.

View on arXiv PDF

Similar