LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models
This addresses the issue of noise robustness in speaker verification systems, which is critical for real-world applications like security and voice assistants, but it is incremental as it builds on existing speech enhancement methods.
The paper tackles the problem of speaker verification performance dropping in noisy environments by proposing LC4SV, a denoising framework that uses a learning-based interpolation agent to compensate for artifacts in enhanced signals, resulting in consistent performance improvements for various unseen SV systems.
The performance of speaker verification (SV) models may drop dramatically in noisy environments. A speech enhancement (SE) module can be used as a front-end strategy. However, existing SE methods may fail to bring performance improvements to downstream SV systems due to artifacts in the predicted signals of SE models. To compensate for artifacts, we propose a generic denoising framework named LC4SV, which can serve as a pre-processor for various unknown downstream SV models. In LC4SV, we employ a learning-based interpolation agent to automatically generate the appropriate coefficients between the enhanced signal and its noisy input to improve SV performance in noisy environments. Our experimental results demonstrate that LC4SV consistently improves the performance of various unseen SV systems. To the best of our knowledge, this work is the first attempt to develop a learning-based interpolation scheme aiming at improving SV performance in noisy environments.