CLSDASJun 24, 2019

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

arXiv:1906.09825v111 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate syllable counting for applications such as speaking rate estimation and developmental research, representing an incremental improvement over existing heuristic and BLSTM approaches.

The paper tackles automatic syllable count estimation from speech by introducing SylNet, an end-to-end neural network method that minimizes error without syllable-level annotations and adapts to new languages with limited data, achieving performance that outperforms previous methods like BLSTMs across multiple languages.

Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic DSP methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This paper presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes