SD CL ASNov 3, 2022

Dynamic Kernels and Channel Attention for Low Resource Speaker Verification

Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

arXiv:2211.02000v22.2h-index: 33

Originality Incremental advance

AI Analysis

This provides an efficient solution for speaker verification with limited data, though it is incremental as it builds on existing CNN and attention mechanisms.

The paper tackles low-resource speaker verification by introducing attention-based dynamic kernels and channel attention in a convolutional neural network, achieving 1.62% EER and 0.18 miniDCF on VoxCeleb1 with a 17% relative improvement over ECAPA-TDNN.

State-of-the-art speaker verification frameworks have typically focused on developing models with increasingly deeper (more layers) and wider (number of channels) models to improve their verification performance. Instead, this paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network to adapt the model parameters to be feature-conditioned. The attention weights on the kernels are further distilled by channel attention and multi-layer feature aggregation to learn global features from speech. This approach provides an efficient solution to improving representation capacity with lower data resources. This is due to the self-adaptation to inputs of the structures of the model parameters. The proposed dynamic convolutional model achieved 1.62\% EER and 0.18 miniDCF on the VoxCeleb1 test set and has a 17\% relative improvement compared to the ECAPA-TDNN using the same training resources.

View on arXiv PDF

Similar