ASCLSDSep 22, 2024

Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming

arXiv:2409.14486v24 citationsh-index: 29
Originality Incremental advance
AI Analysis

This addresses the problem of segmenting speech into words without labels for language processing applications, but it is incremental as it builds on existing methods with efficiency improvements.

The paper tackles unsupervised word discovery from unlabeled speech by proposing a simpler boundary detection and clustering approach instead of dynamic programming methods, achieving similar state-of-the-art results on ZeroSpeech benchmarks while being almost five times faster.

We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we propose a much simpler strategy: we predict word boundaries using the dissimilarity between adjacent self-supervised features, then we cluster the predicted segments to construct a lexicon. For a fair comparison, we update the older ES-KMeans dynamic programming method with better features and boundary constraints. On the five-language ZeroSpeech benchmarks, our simple approach gives similar state-of-the-art results compared to the new ES-KMeans+ method, while being almost five times faster. Project webpage: https://s-malan.github.io/prom-seg-clus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes