SDASSep 9, 2019

Impulse Response Data Augmentation and Deep Neural Networks for Blind Room Acoustic Parameter Estimation

arXiv:1909.03642v213 citations
AI Analysis

This work addresses the challenge of expensive data collection for blind acoustic parameter estimation, offering a practical solution for audio processing applications, though it is incremental as it builds on existing CNN methods.

The authors tackled the problem of blind room acoustic parameter estimation by proposing an impulse response data augmentation method to generate a large, balanced dataset from a small real dataset, enabling training of neural networks that outperform previous state-of-the-art methods, with a new CNN achieving 4-5x faster speed and comparable or better performance.

The reverberation time (T60) and the direct-to-reverberant ratio (DRR) are commonly used to characterize room acoustic environments. Both parameters can be measured from an acoustic impulse response (AIR) or using blind estimation methods that perform estimation directly from speech. When neural networks are used for blind estimation, however, a large realistic dataset is needed, which is expensive and time consuming to collect. To address this, we propose an AIR augmentation method that can parametrically control the T60 and DRR, allowing us to expand a small dataset of real AIRs into a balanced dataset orders of magnitude larger. Using this method, we train a previously proposed convolutional neural network (CNN) and show we can outperform past single-channel state-of-the-art methods. We then propose a more efficient, straightforward baseline CNN that is 4-5x faster, which provides an additional improvement and is better or comparable to all previously reported single- and multi-channel state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes