AS CL LGJun 27, 2023

Confidence-based Ensembles of End-to-End Speech Recognition Models

Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg

arXiv:2306.15824v15.19 citationsh-index: 32

Originality Incremental advance

AI Analysis

This work addresses the challenge of managing proliferating expert speech recognition models for improved cross-domain performance, though it is incremental as it builds on existing ensemble and confidence methods.

The paper tackles the problem of combining multiple domain-specific end-to-end speech recognition models without access to target data, using confidence-based ensembles where only the most confident model's output is selected. It demonstrates that this approach outperforms a language identification-based system with 5 monolingual models and effectively combines base and adapted models for strong performance on both original and target data across multiple datasets and architectures.

The number of end-to-end speech recognition models grows every year. These models are often adapted to new domains or languages resulting in a proliferation of expert systems that achieve great results on target data, while generally showing inferior performance outside of their domain of expertise. We explore combination of such experts via confidence-based ensembles: ensembles of models where only the output of the most-confident model is used. We assume that models' target data is not available except for a small validation set. We demonstrate effectiveness of our approach with two applications. First, we show that a confidence-based ensemble of 5 monolingual models outperforms a system where model selection is performed via a dedicated language identification block. Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data. We validate all our results on multiple datasets and model architectures.

View on arXiv PDF

Similar