ASSDSPApr 12, 2021

Improvement of Noise-Robust Single-Channel Voice Activity Detection with Spatial Pre-processing

arXiv:2104.05481v1
Originality Incremental advance
AI Analysis

This work addresses noise robustness in voice activity detection for applications like speech processing, but it is incremental as it builds on existing SVAD methods with pre-processing enhancements.

The paper tackled the problem of noise-robust voice activity detection (VAD) in single-channel systems by introducing spatial pre-processing methods, including a beamformer and a spatial target speaker detector, which significantly improved SVAD performance across all signal-to-noise ratios and outperformed a baseline multi-channel VAD in challenging noise conditions.

Voice activity detection (VAD) remains a challenge in noisy environments. With access to multiple microphones, prior studies have attempted to improve the noise robustness of VAD by creating multi-channel VAD (MVAD) methods. However, MVAD is relatively new compared to single-channel VAD (SVAD), which has been thoroughly developed in the past. It might therefore be advantageous to improve SVAD methods with pre-processing to obtain superior VAD, which is under-explored. This paper improves SVAD through two pre-processing methods, a beamformer and a spatial target speaker detector. The spatial detector sets signal frames to zero when no potential speaker is present within a target direction. The detector may be implemented as a filter, meaning the input signal for the SVAD is filtered according to the detector's output; or it may be implemented as a spatial VAD to be combined with the SVAD output. The evaluation is made on a noisy reverberant speech database, with clean speech from the Aurora 2 database and with white and babble noise. The results show that SVAD algorithms are significantly improved by the presented pre-processing methods, especially the spatial detector, across all signal-to-noise ratios. The SVAD algorithms with pre-processing significantly outperform a baseline MVAD in challenging noise conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes