CLDec 4, 2024

PERL: Pinyin Enhanced Rephrasing Language Model for Chinese ASR N-best Error Correction

arXiv:2412.03230v23 citationsh-index: 1
AI Analysis

This addresses the issue of underutilizing Pinyin information in Chinese ASR correction, offering improvements for speech recognition systems in Chinese language processing.

The paper tackled the problem of Chinese ASR error correction by proposing a Pinyin-enhanced language model (PERL) for N-best scenarios, achieving a 29.11% reduction in Character Error Rate on Aishell-1 and around 70% CER reduction on domain-specific datasets.

Existing Chinese ASR correction methods have not effectively utilized Pinyin information, a unique feature of the Chinese language. In this study, we address this gap by proposing a \textbf{P}inyin \textbf{E}nhanced \textbf{R}ephrasing \textbf{L}anguage model (PERL) pipeline, designed explicitly for N-best correction scenarios. We conduct experiments on the Aishell-1 dataset and our newly proposed DoAD dataset. The results show that our approach outperforms baseline methods, achieving a 29.11\% reduction in Character Error Rate on Aishell-1 and around 70\% CER reduction on domain-specific datasets. PERL predicts the correct length of the output, leveraging the Pinyin information, which is embedded with a semantic model to perform phonetically similar corrections. Extensive experiments demonstrate the effectiveness of correcting wrong characters using N-best output and the low latency of our model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes