ASAISDJun 23, 2025

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers

arXiv:2506.19875v12 citationsh-index: 2EUSIPCO
Originality Incremental advance
AI Analysis

This work addresses a specific issue in speaker tracking for scenarios with intermittent and moving speakers, representing an incremental improvement over existing methods.

The paper tackled the problem of tracking intermittent and moving speakers, where traditional spatial methods fail due to discontinuous trajectories, by proposing a speaker embedding-based identity reassignment method that improved identity assignment performance for neural and standard tracking systems.

Speaker tracking methods often rely on spatial observations to assign coherent track identities over time. This raises limits in scenarios with intermittent and moving speakers, i.e., speakers that may change position when they are inactive, thus leading to discontinuous spatial trajectories. This paper proposes to investigate the use of speaker embeddings, in a simple solution to this issue. We propose to perform identity reassignment post-tracking, using speaker embeddings. We leverage trajectory-related information provided by an initial tracking step and multichannel audio signal. Beamforming is used to enhance the signal towards the speakers' positions in order to compute speaker embeddings. These are then used to assign new track identities based on an enrollment pool. We evaluate the performance of the proposed speaker embedding-based identity reassignment method on a dataset where speakers change position during inactivity periods. Results show that it consistently improves the identity assignment performance of neural and standard tracking systems. In particular, we study the impact of beamforming and input duration for embedding extraction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes