ASCLAug 3, 2020

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

arXiv:2008.00731v128 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of applying unsupervised speech discovery to new or low-resource datasets by reducing hyperparameter sensitivity, though it is incremental as it builds on existing DTW methods.

The paper tackles the problem of unsupervised spoken term discovery by proposing PDTW, a probabilistic approach that adapts to corpus characteristics to find recurring speech patterns without dataset-specific tuning, achieving consistent performance across five languages and outperforming prior DTW-based systems in pattern coverage.

Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of "sufficiently good" alignments among those require some type of pre-defined criteria, often operationalized as threshold parameters for pair-wise distance metrics between signal representations. In the existing UTD systems, the optimal hyperparameters may differ across datasets, limiting their applicability to new corpora and truly low-resource scenarios. In this paper, we propose a novel probabilistic approach to DTW-based UTD named as PDTW. In PDTW, distributional characteristics of the processed corpus are utilized for adaptive evaluation of alignment quality, thereby enabling systematic discovery of pattern pairs that have similarity what would be expected by coincidence. We test PDTW on Zero Resource Speech Challenge 2017 datasets as a part of 2020 implementation of the challenge. The results show that the system performs consistently on all five tested languages using fixed hyperparameters, clearly outperforming the earlier DTW-based system in terms of coverage of the detected patterns.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes