SDASNov 30, 2020

Look who's not talking

arXiv:2011.14885v129 citations
Originality Incremental advance
AI Analysis

This work provides an incremental improvement for speaker diarisation systems by simplifying the speech activity detection component, which is a major source of errors.

This paper addresses speaker diarisation in 'in the wild' speech recordings by proposing a novel speech activity detection method. It leverages the norm of speaker embeddings as an indicator of speech activity, eliminating the need for a separate model. The method outperforms popular baselines on both in-house and public datasets.

The objective of this work is speaker diarisation of speech recordings 'in the wild'. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes