ASSDOct 31, 2018

Latent variable approach to diarization of audio recordings using ad-hoc randomly placed mobile devices

arXiv:1810.13109v1
Originality Incremental advance
AI Analysis

This addresses the challenge of speaker identification in noisy, multi-device environments, but it is incremental as it builds on existing spatial and statistical methods.

The paper tackles the problem of speaker diarization from audio recordings using multiple ad-hoc mobile devices by jointly modeling directional statistics with a Dirichlet mixture model, achieving a diarization error rate of less than 14% in real-life experiments.

Diarization of audio recordings from ad-hoc mobile devices using spatial information is considered in this paper. A two-channel synchronous recording is assumed for each mobile device, which is used to compute directional statistics separately at each device in a frame-wise manner. The recordings across the mobile devices are asynchronous, but a coarse synchronization is performed by aligning the signals using acoustic events, or real-time clock. Direction statistics computed for all the devices, are then modeled jointly using a Dirichlet mixture model, and the posterior probability over the mixture components is used to derive the diarization information. Experiments on real life recordings using mobile phones show a diarization error rate of less than 14%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes