MLNov 20, 2015

PLDA with Two Sources of Inter-session Variability

arXiv:1511.06772v1
Originality Synthesis-oriented
AI Analysis

This work addresses speaker recognition for scenarios like NIST SRE interviews, but it is incremental as it builds on existing PLDA models.

The authors tackled the problem of speaker recognition in scenarios with multi-channel simultaneous recordings by proposing a modified PLDA model with two inter-session variability terms to capture conversation content and channel variability, resulting in a method applied in a prior Interspeech 2013 paper.

In some speaker recognition scenarios we find conversations recorded simultaneously over multiple channels. That is the case of the interviews in the NIST SRE dataset. To take advantage of that, we propose a modification of the PLDA model that considers two different inter-session variability terms. The first term is tied between all the recordings belonging to the same conversation whereas the second is not. Thus, the former mainly intends to capture the variability due to the phonetic content of the conversation while the latter tries to capture the channel variability. In this document, we derive the equations for this model. This model was applied in the paper "Handling Recordings Acquired Simultaneously over Multiple Channels with PLDA" published at Interspeech 2013.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes