ASCLLGSDMLJul 2, 2019

Attention model for articulatory features detection

arXiv:1907.01914v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech-related tasks such as pronunciation training and TTS, but it appears incremental as it builds on existing LAS architecture with a new decoding technique.

The paper tackled articulatory features detection by applying the Listen, Attend and Spell architecture to phone recognition on small datasets like TIMIT and introduced a novel decoding technique for end-to-end training of articulatory detectors. They achieved results in joint phone recognition and articulatory features detection using multitask learning, though no concrete numbers are provided.

Articulatory distinctive features, as well as phonetic transcription, play important role in speech-related tasks: computer-assisted pronunciation training, text-to-speech conversion (TTS), studying speech production mechanisms, speech recognition for low-resourced languages. End-to-end approaches to speech-related tasks got a lot of traction in recent years. We apply Listen, Attend and Spell~(LAS)~\cite{Chan-LAS2016} architecture to phones recognition on a small small training set, like TIMIT~\cite{TIMIT-1992}. Also, we introduce a novel decoding technique that allows to train manners and places of articulation detectors end-to-end using attention models. We also explore joint phones recognition and articulatory features detection in multitask learning setting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes