Simple Attention Module based Speaker Verification with Iterative noisy label detection
This work addresses speaker verification for audio processing by improving accuracy through attention mechanisms and noise handling, though it is incremental as it builds on existing attention modules.
The paper tackles speaker verification by introducing a simple attention module (SimAM) and an iterative noisy label detection method, achieving a 0.675% equal error rate (EER) with SimAM and reducing it to 0.643% with the detection method on the VoxCeleb1 dataset.
Recently, the attention mechanism such as squeeze-and-excitation module (SE) and convolutional block attention module (CBAM) has achieved great success in deep learning-based speaker verification system. This paper introduces an alternative effective yet simple one, i.e., simple attention module (SimAM), for speaker verification. The SimAM module is a plug-and-play module without extra modal parameters. In addition, we propose a noisy label detection method to iteratively filter out the data samples with a noisy label from the training data, considering that a large-scale dataset labeled with human annotation or other automated processes may contain noisy labels. Data with the noisy label may over parameterize a deep neural network (DNN) and result in a performance drop due to the memorization effect of the DNN. Experiments are conducted on VoxCeleb dataset. The speaker verification model with SimAM achieves the 0.675% equal error rate (EER) on VoxCeleb1 original test trials. Our proposed iterative noisy label detection method further reduces the EER to 0.643%.