ASAISDSPAug 18, 2025

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings

arXiv:2508.14115v1h-index: 2MMSP
Originality Incremental advance
AI Analysis

This work addresses the challenge of real-time speaker tracking for applications like audio processing, but it is incremental as it builds on existing embedding and tracking methods.

The paper tackled the problem of low-latency tracking of multiple speakers by developing a Knowledge Distillation-based method to extract short-context speaker embeddings from overlapping speech, using beamforming to reduce overlap, and found that the distilled models were effective and more robust to overlap, though blockwise reassignment showed limitations in handling simultaneous speech.

Speaker embeddings are promising identity-related features that can enhance the identity assignment performance of a tracking system by leveraging its spatial predictions, i.e, by performing identity reassignment. Common speaker embedding extractors usually struggle with short temporal contexts and overlapping speech, which imposes long-term identity reassignment to exploit longer temporal contexts. However, this increases the probability of tracking system errors, which in turn impacts negatively on identity reassignment. To address this, we propose a Knowledge Distillation (KD) based training approach for short context speaker embedding extraction from two speaker mixtures. We leverage the spatial information of the speaker of interest using beamforming to reduce overlap. We study the feasibility of performing identity reassignment over blocks of fixed size, i.e., blockwise identity reassignment, to go towards a low-latency speaker embedding based tracking system. Results demonstrate that our distilled models are effective at short-context embedding extraction and more robust to overlap. Although, blockwise reassignment results indicate that further work is needed to handle simultaneous speech more effectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes