SD ASNov 15, 2019

Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space

Tina Raissi, Santiago Pascual, Maurizio Omologo

arXiv:1911.06713v11.4

Originality Incremental advance

AI Analysis

This addresses synchronization issues in multi-device speech recognition for applications like conversational analysis, but it is incremental as it builds on existing neural methods for signal processing.

The paper tackled the problem of detecting sample drops in asynchronous multi-microphone speech recordings by designing a neural-based model using a CNN-LSTM encoder with multi-head attention, achieving an F1 score of 88% on the CHiME-5 corpus and comparable results on artificial data.

In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based model that given a short window in the time domain, detects whether one or more devices have been subjected to a sample drop event. The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step. The latter is based on the application of normalized cross-correlation between signals acquired by different devices. The architecture of the neural network relies on a CNN-LSTM encoder, followed by multi-head attention. The experiments are conducted using both artificial and real data. Our proposed approach obtained F1 score of 88% on an evaluation set extracted from the CHiME-5 corpus. A comparable performance was found in a larger set of experiments conducted on a set of multi-channel artificial scenes.

View on arXiv PDF

Similar