Context-Enhanced Granular Edit Representation for Efficient and Accurate ASR Post-editing
This addresses the need for efficient and accurate post-editing of ASR errors, which is crucial for industry and users, though it appears incremental as it builds on prior compact representations.
The paper tackles the problem of inefficient and inaccurate ASR post-editing by introducing CEGER, a compact edit representation that enables LLMs to generate fine-grained commands, achieving state-of-the-art accuracy with the lowest word error rate on the LibriSpeech dataset.
Despite ASR technology being full-scale adopted by industry and for large portions of the population, ASR systems often have errors that require editors to post-edit text quality. While LLMs are powerful post-editing tools, baseline full rewrite models have inference inefficiencies because they often generate the same redundant text over and over again. Compact edit representations have existed but often lack the efficacy and context required for optimal accuracy. This paper introduces CEGER (Context-Enhanced Granular Edit Representation), a compact edit representation that was generated for highly accurate, efficient ASR post-editing. CEGER allows LLMs to generate a sequence of structured, fine-grained, contextually rich commands to modify the original ASR output. A separate expansion module deterministically reconstructs the corrected text based on the commands. Extensive experiments on the LibriSpeech dataset that were conducted, CEGER achieves state-of-the-art accuracy, achieving the lowest word error rate (WER) versus full rewrite and prior compact representations.