ASAICLSDApr 2, 2018

Speaker-Invariant Training via Adversarial Learning

arXiv:1804.00732v3115 citations
Originality Incremental advance
AI Analysis

This addresses speaker variability in ASR systems, offering a novel method for improving robustness without explicit speaker-independent transformations, though it is incremental as it builds on existing adversarial and multi-task learning techniques.

The paper tackles the problem of speaker variability in automatic speech recognition by proposing speaker-invariant training (SIT), an adversarial multi-task learning scheme that reduces inter-talker feature variability while enhancing senone discriminability, resulting in a 4.99% relative word error rate improvement on the CHiME-3 dataset.

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker-invariant and senone-discriminative deep feature is learned through this adversarial multi-task learning. With SIT, a canonical DNN acoustic model with significantly reduced variance in its output probabilities is learned with no explicit speaker-independent (SI) transformations or speaker-specific representations used in training or testing. Evaluated on the CHiME-3 dataset, the SIT achieves 4.99% relative word error rate (WER) improvement over the conventional SI acoustic model. With additional unsupervised speaker adaptation, the speaker-adapted (SA) SIT model achieves 4.86% relative WER gain over the SA SI acoustic model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes