Evaluating Automatic Speech Recognition Systems in Comparison With Human Perception Results Using Distinctive Feature Measures
This work addresses the need for more nuanced evaluation of ASR systems for researchers and developers, though it is incremental as it builds on existing distinctive feature theory.
The paper tackled the problem of evaluating automatic speech recognition (ASR) systems by comparing them with human perception using linguistic distinctive features, resulting in methods that provide a more detailed performance profile than conventional phone or word-level criteria.
This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional performance criteria that focus on the phone or word level, and are intended to provide a more detailed profile of ASR system performance,as well as a means for direct comparison with human perception results at the sub-phonemic level.