ASHCLGSep 22, 2022

Cross-domain Voice Activity Detection with Self-Supervised Representations

arXiv:2209.11061v15 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses the domain adaptation challenge in VAD for speech applications, offering a more robust solution than current methods, though it is incremental as it builds on existing SSL techniques.

The paper tackled the problem of voice activity detection (VAD) in cross-domain settings, where changes in speaker, microphone, or environment degrade performance, and showed that using self-supervised learning (SSL) representations from the Common Voice corpus improves results over hand-crafted features and off-the-shelf VADs, with significant gains in cross-domain scenarios.

Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which is a necessary first step for many today's speech based applications. Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics, such as Mel Filter Banks (MFBs). Such methods therefore require an extra normalisation step to adapt to a new domain where the acoustics is impacted, which can be simply due to a change of speaker, microphone, or environment. In addition, this normalisation step is usually a rather rudimentary method that has certain limitations, such as being highly susceptible to the amount of data available for the new domain. Here, we exploited the crowd-sourced Common Voice (CV) corpus to show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains, because they are computed with contextualised representations of speech across multiple domains. SSL representations also achieve better results than systems based on hand-crafted representations (MFBs), and off-the-shelf VADs, with significant improvement in cross-domain settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes