A Speech Test Set of Practice Business Presentations with Additional Relevant Texts
This provides a domain-specific dataset for ASR evaluation, but it is incremental as it focuses on a narrow scenario without broad methodological advances.
The authors tackled the problem of evaluating automatic speech recognition (ASR) systems in domain-specific contexts by creating a test corpus of 39 short English presentations by non-native high school students, including slides and web-pages, and benchmarked three baseline ASR systems, showing their imperfection.
We present a test corpus of audio recordings and transcriptions of presentations of students' enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable. The corpus consists of 39 presentations in English, each up to 90 seconds long. The speakers are high school students from European countries with English as their second language. We benchmark three baseline ASR systems on the corpus and show their imperfection.