SDAICLASFeb 4, 2025

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

arXiv:2502.02683v1h-index: 34
Originality Incremental advance
AI Analysis

This work addresses the problem of real-time speaker identification for speech translation systems, which is incremental as it builds on existing transducer-based methods.

The paper tackled streaming speaker change detection and gender classification in multi-talker speech translation by incorporating speaker embeddings into a transducer-based model, achieving high accuracy in both tasks.

Streaming multi-talker speech translation is a task that involves not only generating accurate and fluent translations with low latency but also recognizing when a speaker change occurs and what the speaker's gender is. Speaker change information can be used to create audio prompts for a zero-shot text-to-speech system, and gender can help to select speaker profiles in a conventional text-to-speech model. We propose to tackle streaming speaker change detection and gender classification by incorporating speaker embeddings into a transducer-based streaming end-to-end speech translation model. Our experiments demonstrate that the proposed methods can achieve high accuracy for both speaker change detection and gender classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes