SD AI CL ASFeb 4, 2025

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Peidong Wang, Naoyuki Kanda, Jian Xue, Jinyu Li, Xiaofei Wang, Aswin Shanmugam Subramanian, Junkun Chen, Sunit Sivasankaran, Xiong Xiao, Yong Zhao

arXiv:2502.02683v14.0h-index: 34

Originality Incremental advance

AI Analysis

This work addresses the problem of real-time speaker identification for speech translation systems, which is incremental as it builds on existing transducer-based methods.

The paper tackled streaming speaker change detection and gender classification in multi-talker speech translation by incorporating speaker embeddings into a transducer-based model, achieving high accuracy in both tasks.

Streaming multi-talker speech translation is a task that involves not only generating accurate and fluent translations with low latency but also recognizing when a speaker change occurs and what the speaker's gender is. Speaker change information can be used to create audio prompts for a zero-shot text-to-speech system, and gender can help to select speaker profiles in a conventional text-to-speech model. We propose to tackle streaming speaker change detection and gender classification by incorporating speaker embeddings into a transducer-based streaming end-to-end speech translation model. Our experiments demonstrate that the proposed methods can achieve high accuracy for both speaker change detection and gender classification.

View on arXiv PDF

Similar