CLAIOct 20, 2016

Jointly Learning to Align and Convert Graphemes to Phonemes with Neural Attention Models

arXiv:1610.06540v143 citations
Originality Incremental advance
AI Analysis

This work addresses a key problem in speech and language processing for applications like text-to-speech, though it is incremental as it builds on existing attention mechanisms.

The authors tackled grapheme-to-phoneme conversion by proposing an attention-enabled encoder-decoder model that jointly learns alignments and conversions, achieving state-of-the-art results on three standard datasets (CMUDict, Pronlex, and NetTalk).

We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attention-enabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. We explore different types of attention models, including global and local attention, and our best models achieve state-of-the-art results on three standard data sets (CMUDict, Pronlex, and NetTalk).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes