ASAISep 12, 2024

Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

arXiv:2409.07770v22 citationsh-index: 9
AI Analysis

This work addresses speaker verification for speech processing by improving performance through better use of pre-trained models, though it is incremental as it builds on existing SSL methods.

The paper tackled the underutilization of multi-layered SSL encoders in speaker verification by proposing L-TDNN, which processes layer-wise features to extract speaker vectors, achieving the lowest error rates in experiments while maintaining model compactness and inference efficiency.

Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered nature of SSL encoders. To address this limitation, we propose the layer-aware time-delay neural network (L-TDNN), which directly performs layer/frame-wise processing on the layer-wise hidden state outputs from pre-trained models, extracting fixed-size speaker vectors. L-TDNN comprises a layer-aware convolutional network, a frame-adaptive layer aggregation, and attentive statistic pooling, explicitly modeling of the recognition and processing of previously overlooked layer dimension. We evaluated L-TDNN across multiple speech SSL Transformers and diverse speech-speaker corpora against other approaches for leveraging pre-trained encoders. L-TDNN consistently demonstrated robust verification performance, achieving the lowest error rates throughout the experiments. Concurrently, it stood out in terms of model compactness and exhibited inference efficiency comparable to the existing systems. These results highlight the advantages derived from the proposed layer-aware processing approach. Future work includes exploring joint training with SSL frontends and the incorporation of score calibration to further enhance state-of-the-art verification performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes