Hybrid phonetic-neural model for correction in speech recognition systems
This work addresses error reduction for closed ASR systems in specific language domains, but it is incremental as it builds on existing post-processing strategies.
The paper tackled the problem of speech recognition errors in domain-specific language environments by combining a phonetic correction algorithm with a deep neural network, achieving a reduction in word error rate (WER) on a telesales audio database.
Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular application domains. Some strategies have been explored to reduce errors in closed ASRs through post-processing, particularly automatic spell checking, and deep learning approaches. In this article, we explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database. The results exhibit a reduction in the word error rate (WER), both in the original transcription and in the phonetic correction, which shows the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.