ASAICLSDFeb 1, 2022

BEA-Base: A Benchmark for ASR of Spontaneous Hungarian

arXiv:2202.00601v1584 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a benchmark for ASR of spontaneous Hungarian, which is incremental as it addresses a domain-specific gap in resources.

The paper tackles the lack of accessible benchmark datasets for Automatic Speech Recognition (ASR) of spontaneous Hungarian by introducing BEA-Base, a dataset of 140 speakers, and reports that the best method achieved a 45% reduction in recognition error rate compared to classical approaches.

Hungarian is spoken by 15 million people, still, easily accessible Automatic Speech Recognition (ASR) benchmark datasets - especially for spontaneous speech - have been practically unavailable. In this paper, we introduce BEA-Base, a subset of the BEA spoken Hungarian database comprising mostly spontaneous speech of 140 speakers. It is built specifically to assess ASR, primarily for conversational AI applications. After defining the speech recognition subsets and task, several baselines - including classic HMM-DNN hybrid and end-to-end approaches augmented by cross-language transfer learning - are developed using open-source toolkits. The best results obtained are based on multilingual self-supervised pretraining, achieving a 45% recognition error rate reduction as compared to the classical approach - without the application of an external language model or additional supervised data. The results show the feasibility of using BEA-Base for training and evaluation of Hungarian speech recognition systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes