CLSDASMay 13, 2022

Unified Modeling of Multi-Domain Multi-Device ASR Systems

arXiv:2205.06655v34 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the inefficiency of maintaining separate ASR models for different domains and devices, offering a more streamlined solution for speech recognition systems.

The paper tackled the problem of managing multiple domain-specific and device-specific ASR models by proposing a unified model that integrates them using domain embedding, domain experts, mixture of experts, and adversarial training, resulting in up to 10% relative accuracy gains over baseline models with minimal parameter increase.

Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices. In this paper, we propose an innovative approach that integrates the different per-domain per-device models into a unified model, using a combination of domain embedding, domain experts, mixture of experts and adversarial training. We run careful ablation studies to show the benefit of each of these innovations in contributing to the accuracy of the overall unified model. Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models, giving relative gains of up to 10% over a baseline model with negligible increase in the number of parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes