SDLGASNov 6, 2021

Towards noise robust trigger-word detection with contrastive learning pre-task for fast on-boarding of new trigger-words

arXiv:2111.03971v31 citations
Originality Incremental advance
AI Analysis

This addresses the tedious and time-consuming process of supporting new trigger-words in voice assistants, though it appears incremental.

The paper tackles the problem of fast onboarding of new trigger-words for voice assistants by using contrastive learning as a pre-training task to reduce data requirements, showing comparable results to traditional methods with less data.

Trigger-word detection plays an important role as the entry point of user's communication with voice assistants. But supporting a particular word as a trigger-word involves huge amount of data collection, augmentation and labelling for that word. This makes supporting new trigger-words a tedious and time consuming process. To combat this, we explore the use of contrastive learning as a pre-training task that helps the detection model to generalize to different words and noise conditions. We explore supervised contrastive techniques and also propose a novel self-supervised training technique using chunked words from long sentence audios. We show that both supervised and the new self-supervised contrastive pre-training techniques have comparable results to a traditional classification pre-training on new trigger words with less data availability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes