CLAug 2, 2016

Efficient Segmental Cascades for Speech Recognition

arXiv:1608.00929v14 citations
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks for researchers and practitioners using segmental models in speech recognition, though it is incremental as it builds on existing cascade methods.

The paper tackled the computational inefficiency of discriminative segmental models in speech recognition by proposing efficient segmental cascades with techniques like feature reduction and pruning, achieving competitive performance while significantly reducing decoding, pruning, and training time.

Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes