Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
This work addresses robustness evaluation for NLU models in speech-based applications, but it is incremental as it builds on existing error analysis techniques.
The paper tackles the problem of evaluating how speech recognition errors affect natural language understanding (NLU) models in spoken dialogue systems, proposing a method that combines back transcription with error categorization and shows that using synthesized speech instead of audio recordings does not significantly change the results.
In a spoken dialogue system, an NLU model is preceded by a speech recognition system that can deteriorate the performance of natural language understanding. This paper proposes a method for investigating the impact of speech recognition errors on the performance of natural language understanding models. The proposed method combines the back transcription procedure with a fine-grained technique for categorizing the errors that affect the performance of NLU models. The method relies on the usage of synthesized speech for NLU evaluation. We show that the use of synthesized speech in place of audio recording does not change the outcomes of the presented technique in a significant way.