ASCLSDJan 31, 2024

Revisiting speech segmentation and lexicon learning with better features

arXiv:2401.17902v13 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses unsupervised speech processing for low-resource languages, but it is incremental as it builds on existing methods with improved features.

The paper tackles the problem of segmenting unlabelled speech into word-like segments and learning a lexicon without supervision, achieving state-of-the-art performance on the ZeroSpeech benchmarks.

We revisit a self-supervised method that segments unlabelled speech into word-like segments. We start from the two-stage duration-penalised dynamic programming method that performs zero-resource segmentation without learning an explicit lexicon. In the first acoustic unit discovery stage, we replace contrastive predictive coding features with HuBERT. After word segmentation in the second stage, we get an acoustic word embedding for each segment by averaging HuBERT features. These embeddings are clustered using K-means to get a lexicon. The result is good full-coverage segmentation with a lexicon that achieves state-of-the-art performance on the ZeroSpeech benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes