SDLGASFeb 26, 2021

The NPU System for the 2020 Personalized Voice Trigger Challenge

arXiv:2102.13552v18 citations
AI Analysis

This work addresses the challenge of accurately detecting personalized voice triggers in audio, which is incremental as it builds on existing keyword spotting and speaker verification methods.

The paper tackles the problem of personalized voice trigger detection by developing a system with a keyword spotting (KWS) subsystem using a multi-scale dilated temporal convolutional network and a speaker verification (SV) subsystem, achieving detection costs of 0.081 and 0.091 in close talking and far-field tasks.

This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge. Our submitted system consists of two independently trained subsystems: a small footprint keyword spotting (KWS) system and a speaker verification (SV) system. For the KWS system, a multi-scale dilated temporal convolutional (MDTC) network is proposed to detect wake-up word (WuW). For SV system, Write something here. The KWS predicts posterior probabilities of whether an audio utterance contains WuW and estimates the location of WuW at the same time. When the posterior probability ofWuW reaches a predefined threshold, the identity information of triggered segment is determined by the SV system. On evaluation dataset, our submitted system obtains detection costs of 0.081and 0.091 in close talking and far-field tasks, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes