AS SDOct 31, 2018

Latent variable approach to diarization of audio recordings using ad-hoc randomly placed mobile devices

Srikanth Raj Chetupalli, Anirban Bhowmick, Thippur V. Sreenivas

arXiv:1810.13109v11.2

Originality Incremental advance

AI Analysis

This addresses the challenge of speaker identification in noisy, multi-device environments, but it is incremental as it builds on existing spatial and statistical methods.

The paper tackles the problem of speaker diarization from audio recordings using multiple ad-hoc mobile devices by jointly modeling directional statistics with a Dirichlet mixture model, achieving a diarization error rate of less than 14% in real-life experiments.

Diarization of audio recordings from ad-hoc mobile devices using spatial information is considered in this paper. A two-channel synchronous recording is assumed for each mobile device, which is used to compute directional statistics separately at each device in a frame-wise manner. The recordings across the mobile devices are asynchronous, but a coarse synchronization is performed by aligning the signals using acoustic events, or real-time clock. Direction statistics computed for all the devices, are then modeled jointly using a Dirichlet mixture model, and the posterior probability over the mixture components is used to derive the diarization information. Experiments on real life recordings using mobile phones show a diarization error rate of less than 14%.

View on arXiv PDF

Similar