ASCLApr 24, 2021

Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions

arXiv:2104.11985v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for domain and speaker-invariant language ID systems in multilingual ASR pipelines, especially for low-resource and endangered languages, but appears incremental.

The paper tackled the problem of language identification from speech, particularly for low-resource languages with single-speaker recordings, and reported promising results using a convolutional neural network with a self-attentive pooling layer.

This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes