AS LG SDNov 11, 2021

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Ladislav Mošner, Oldřich Plchot, Lukáš Burget, Jan Černocký

arXiv:2111.06458v13.311 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work provides a dataset to tackle the problem of limited multi-channel training data for researchers in speaker verification, though it is incremental as it builds on existing datasets and methods.

The authors addressed the lack of a standard benchmark for multi-channel speaker verification by creating the MultiSV dataset, which is derived from simulated data based on Voxceleb and modified VOiCES trials, and they reported results using neural network-based beamforming methods.

Motivated by unconsolidated data situation and the lack of a standard benchmark in the field, we complement our previous efforts and present a comprehensive corpus designed for training and evaluating text-independent multi-channel speaker verification systems. It can be readily used also for experiments with dereverberation, denoising, and speech enhancement. We tackled the ever-present problem of the lack of multi-channel training data by utilizing data simulation on top of clean parts of the Voxceleb dataset. The development and evaluation trials are based on a retransmitted Voices Obscured in Complex Environmental Settings (VOiCES) corpus, which we modified to provide multi-channel trials. We publish full recipes that create the dataset from public sources as the MultiSV corpus, and we provide results with two of our multi-channel speaker verification systems with neural network-based beamforming based either on predicting ideal binary masks or the more recent Conv-TasNet.

View on arXiv PDF Code

Similar