SD AI CL LGSep 23, 2021

Joint speaker diarisation and tracking in switching state-space model

arXiv:2109.11140v12.3

Originality Incremental advance

AI Analysis

This addresses the challenge of accurate speaker diarisation in dynamic meeting environments for applications like transcription, but it is incremental as it builds on prior work using location information.

The paper tackles the problem of speaker diarisation when speakers move during meetings by proposing a joint model that tracks speaker movements and performs diarisation simultaneously, using a state-space model implemented as a particle filter. Experiments on a Microsoft rich meeting transcription task show it performs comparably with other location-based methods.

Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be complementary to speaker embeddings in the diarisation task. However, these approaches often assume that speakers are fairly stationary throughout a meeting. This paper relaxes this assumption, by proposing to explicitly track the movements of speakers while jointly performing diarisation within a unified model. A state-space model is proposed, where the hidden state expresses the identity of the current active speaker and the predicted locations of all speakers. The model is implemented as a particle filter. Experiments on a Microsoft rich meeting transcription task show that the proposed joint location tracking and diarisation approach is able to perform comparably with other methods that use location information.

View on arXiv PDF

Similar