SD CL AS SPFeb 3, 2022

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

arXiv:2202.01624v312.293 citationsh-index: 74

Originality Incremental advance

AI Analysis

This work addresses speaker verification challenges for short utterances, which is an incremental improvement in a domain-specific area.

The paper tackled the problem of text-independent speaker verification with short utterances by proposing a multi-scale frequency-channel attention (MFA) mechanism integrated into a TDNN framework, achieving state-of-the-art performance on the VoxCeleb database while reducing parameters and computational complexity.

The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification. However, they require a large number of filters to capture the speaker characteristics at any local frequency region. In addition, the performance of such systems may degrade under short utterance scenarios. To address these issues, we propose a multi-scale frequency-channel attention (MFA), where we characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN. We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and computation complexity. Further, the MFA mechanism is found to be effective for speaker verification with short test utterances.

View on arXiv PDF

Similar