CL AI SD ASSep 15, 2025

Fun-ASR Technical Report

Keyu An, Yanni Chen, Chong Deng, Changfeng Gao, Zhifu Gao, Bo Gong, Xiangang Li, Yabin Li, Xiang Lv, Yunjie Ji, Yiheng Jiang, Bin Ma

arXiv:2509.12508v318.813 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This addresses the need for robust and deployable ASR systems in industry, though it appears incremental by combining existing paradigms with practical enhancements.

The paper tackles the problem of LLM-based ASR systems underperforming in real-world applications due to issues like hallucination, presenting Fun-ASR, which achieves state-of-the-art performance on real industry datasets through production-oriented optimizations.

In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination, which can significantly degrade user experience in real-world ASR applications. In this paper, we present Fun-ASR, a large-scale, LLM-based ASR system that synergistically combines massive data, large model capacity, LLM integration, and reinforcement learning to achieve state-of-the-art performance across diverse and complex speech recognition scenarios. Moreover, Fun-ASR is specifically optimized for practical deployment, with enhancements in streaming capability, noise robustness, code-switching, hotword customization, and satisfying other real-world application requirements. Experimental results show that while most LLM-based ASR systems achieve strong performance on open-source benchmarks, they often underperform on real industry evaluation sets. Thanks to production-oriented optimizations, Fun-ASR achieves state-of-the-art performance on real application datasets, demonstrating its effectiveness and robustness in practical settings.

View on arXiv PDF

Similar