CLAISep 26, 2025

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

arXiv:2509.22536v43 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This work addresses the barrier of computational expense for researchers and practitioners in AI by providing an open-source, efficient training method, though it is incremental as it builds on existing FP8 quantization techniques.

The paper tackles the high computational cost of training Large Language Models by introducing an end-to-end FP8 training recipe, achieving performance comparable to BF16 baselines with up to a 22% reduction in training time, 14% decrease in memory usage, and 19% increase in throughput.

The immense computational cost of training Large Language Models (LLMs) presents a major barrier to innovation. While FP8 training offers a promising solution with significant theoretical efficiency gains, its widespread adoption has been hindered by the lack of a comprehensive, open-source training recipe. To bridge this gap, we introduce an end-to-end FP8 training recipe that seamlessly integrates continual pre-training and supervised fine-tuning. Our methodology employs a fine-grained, hybrid-granularity quantization strategy to maintain numerical fidelity while maximizing computational efficiency. Through extensive experiments, including the continue pre-training of models on a 160B-token corpus, we demonstrate that our recipe is not only remarkably stable but also essentially lossless, achieving performance on par with the BF16 baseline across a suite of reasoning benchmarks. Crucially, this is achieved with substantial efficiency improvements, including up to a 22% reduction in training time, a 14% decrease in peak memory usage, and a 19% increase in throughput. Our results establish FP8 as a practical and robust alternative to BF16, and we will release the accompanying code to further democratize large-scale model training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes