SDASNov 15, 2019

Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space

arXiv:1911.06713v1
Originality Incremental advance
AI Analysis

This addresses synchronization issues in multi-device speech recognition for applications like conversational analysis, but it is incremental as it builds on existing neural methods for signal processing.

The paper tackled the problem of detecting sample drops in asynchronous multi-microphone speech recordings by designing a neural-based model using a CNN-LSTM encoder with multi-head attention, achieving an F1 score of 88% on the CHiME-5 corpus and comparable results on artificial data.

In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based model that given a short window in the time domain, detects whether one or more devices have been subjected to a sample drop event. The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step. The latter is based on the application of normalized cross-correlation between signals acquired by different devices. The architecture of the neural network relies on a CNN-LSTM encoder, followed by multi-head attention. The experiments are conducted using both artificial and real data. Our proposed approach obtained F1 score of 88% on an evaluation set extracted from the CHiME-5 corpus. A comparable performance was found in a larger set of experiments conducted on a set of multi-channel artificial scenes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes