SDASAug 20, 2018

Deep Residual Network for Sound Source Localization in the Time Domain

arXiv:1808.06429v126 citations
Originality Incremental advance
AI Analysis

This work addresses sound source localization for speech recognition systems, presenting an incremental improvement over existing methods.

The study tackled sound source localization in the time domain using a deep residual neural network, achieving 99.2% accuracy for 30 ms sound frames and a standard deviation of 4° in localization, while reducing word error rate by 1.14% in a speech recognition pipeline compared to GCC-PHAT.

This study presents a system for sound source localization in time domain using a deep residual neural network. Data from the linear 8 channel microphone array with 3 cm spacing is used by the network for direction estimation. We propose to use the deep residual network for sound source localization considering the localization task as a classification task. This study describes the gathered dataset and developed architecture of the neural network. We will show the training process and its result in this study. The developed system was tested on validation part of the dataset and on new data capture in real time. The accuracy classification of 30 m sec sound frames is 99.2%. The standard deviation of sound source localization is 4°. The proposed method of sound source localization was tested inside of speech recognition pipeline. Its usage decreased word error rate by 1.14% in comparison with similar speech recognition pipeline using GCC-PHAT sound source localization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes