Contextualized Token Discrimination for Speech Search Query Correction
This addresses the problem of improving search accuracy for users of speech search by correcting ASR errors, though it appears incremental as it builds on existing BERT-based approaches.
The paper tackles the problem of correcting spelling errors in speech search queries from Automated Speech Recognition (ASR) systems by introducing a novel method called Contextualized Token Discrimination (CTD), which uses BERT for contextualized token representations and a composition layer to enhance semantics, achieving superior performance across all metrics in experiments.
Query spelling correction is an important function of modern search engines since it effectively helps users express their intentions clearly. With the growing popularity of speech search driven by Automated Speech Recognition (ASR) systems, this paper introduces a novel method named Contextualized Token Discrimination (CTD) to conduct effective speech query correction. In CTD, we first employ BERT to generate token-level contextualized representations and then construct a composition layer to enhance semantic information. Finally, we produce the correct query according to the aggregated token representation, correcting the incorrect tokens by comparing the original token representations and the contextualized representations. Extensive experiments demonstrate the superior performance of our proposed method across all metrics, and we further present a new benchmark dataset with erroneous ASR transcriptions to offer comprehensive evaluations for audio query correction.