SDASJun 4, 2020

A study on more realistic room simulation for far-field keyword spotting

arXiv:2006.02774v312 citations
AI Analysis

This work addresses the challenge of robust keyword spotting in noisy far-field environments for speech recognition systems, representing an incremental improvement in simulation techniques.

The study tackled the problem of improving far-field keyword spotting by using more realistic room simulation for training, resulting in up to 35.8% relative improvement over a common baseline method without fine-tuning on in-domain data.

We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study, a wake word task is used to measure the impact of these factors in comparison with a ground-truth set of measured RIRs. On a hold-out set of re-recordings under clean and noisy far-field conditions, we demonstrate up to $35.8\%$ relative improvement over the commonly-used (single absorption coefficient) image source method. Source code is made available in the Pyroomacoustics package, allowing others to incorporate these techniques in their work.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes