ASCLJul 5, 2024

Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units

arXiv:2407.04652v11 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses the problem of limited data for training KWS systems in speech processing, offering an incremental improvement over existing methods.

The paper tackles the performance gap in end-to-end keyword search (KWS) compared to ASR-based methods by proposing a pretraining method using acoustic unit discovery (AUD) on untranscribed data, resulting in significant performance improvements when finetuned.

End-to-end (E2E) keyword search (KWS) has emerged as an alternative and complimentary approach to conventional keyword search which depends on the output of automatic speech recognition (ASR) systems. While E2E methods greatly simplify the KWS pipeline, they generally have worse performance than their ASR-based counterparts, which can benefit from pretraining with untranscribed data. In this work, we propose a method for pretraining E2E KWS systems with untranscribed data, which involves using acoustic unit discovery (AUD) to obtain discrete units for untranscribed data and then learning to locate sequences of such units in the speech. We conduct experiments across languages and AUD systems: we show that finetuning such a model significantly outperforms a model trained from scratch, and the performance improvements are generally correlated with the quality of the AUD system used for pretraining.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes