Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference
For researchers and practitioners deploying LLMs in HPC, this work provides the first comprehensive understanding of how soft errors affect LLM inference, offering practical guidance for error detection and mitigation.
This paper systematically studies soft error propagation in LLM inference using a new fault-injection framework (LLMFI), revealing 17 key takeaways and four low-overhead reliability improvement directions across three models and thirteen tasks.
Large language models (LLMs) are increasingly integrated into high-performance computing (HPC) workflows, accelerating scientific discovery through diverse perspectives such as code generation and domain-specific decision-making. Yet, how soft errors propagate and affect LLM inference remains largely unexplored. To bridge this gap, we present a comprehensive study on error propagation in LLM inference, enabled by our proposed LLMFI, a configurable and deterministic fault-injection framework. Using LLMFI, we systematically inject faults across three open-weighted LLMs and thirteen representative tasks, covering reasoning, multilingual, mathematical, and coding domains. In addition, we conduct fine-grained case studies that reveal critical vulnerability patterns. Overall, our study yields 17 takeaways that advance the understanding of error propagation in LLM inference and introduces four low-overhead directions to improve reliability through software-only modification, offering practical guidance for future error detection and mitigation.