AS CL LG MLFeb 24, 2022

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

Quan Wang, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno

arXiv:2202.12163v44.315 citations

Originality Incremental advance

AI Analysis

This work addresses language identification for long-form audio, enabling streaming applications, but it is incremental as it builds on existing conformer architectures.

The paper tackles language identification in long-form speech by proposing a conformer-based system with attentive temporal pooling for streaming inference and domain adaptation methods, achieving significant performance improvements over LSTM and transformer models.

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation improve model accuracy.

View on arXiv PDF

Similar