CLLGASJul 10, 2019

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

arXiv:1907.04882v14 citations
Originality Incremental advance
AI Analysis

This work addresses incremental improvements in acoustic model optimization for ASR systems, potentially benefiting speech recognition applications.

The paper tackled the problem of optimizing acoustic models for automatic Speech Recognition by proposing a variant of Evolutionary Stochastic Gradient Descent (ESGD) that uses a well-trained model as an anchor to ensure performance never degrades, resulting in improved loss and ASR performance on BN50 and SWB300 datasets.

Evolutionary stochastic gradient descent (ESGD) was proposed as a population-based approach that combines the merits of gradient-aware and gradient-free optimization algorithms for superior overall optimization performance. In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR). In this variant, we assume the existence of a well-trained acoustic model and use it as an anchor in the parent population whose good "gene" will propagate in the evolution to the offsprings. We propose an ESGD algorithm leveraging the anchor models such that it guarantees the best fitness of the population will never degrade from the anchor model. Experiments on 50-hour Broadcast News (BN50) and 300-hour Switchboard (SWB300) show that the ESGD with anchors can further improve the loss and ASR performance over the existing well-trained acoustic models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes