ASSDApr 2, 2020

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

arXiv:2004.00932v223 citations
Originality Incremental advance
AI Analysis

This work addresses speech intelligibility enhancement for communication in adverse acoustic environments, representing an incremental improvement over existing methods.

The paper tackled the problem of degraded speech intelligibility in noisy environments by proposing iMetricGAN, a deep learning-based speech modification method that maintains RMS level and duration while enhancing intelligibility. Experimental results showed that iMetricGAN outperformed state-of-the-art algorithms in objective measures like SIIB and ESTOI under Cafeteria noise, with formal listening tests confirming significant intelligibility gains in noisy and reverberant conditions.

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN approach to optimize the speech intelligibility metrics with generative adversarial networks (GANs). Experimental results show that the proposed iMetricGAN outperforms conventional state-of-the-art algorithms in terms of objective measures, i.e., speech intelligibility in bits (SIIB) and extended short-time objective intelligibility (ESTOI), under a Cafeteria noise condition. In addition, formal listening tests reveal significant intelligibility gains when both noise and reverberation exist.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes