Phonetic Feedback for Speech Enhancement With and Without Parallel Speech Data
This addresses the challenge of improving speech intelligibility in enhancement systems for applications like hearing aids or communication devices, though it is incremental in leveraging existing techniques.
The paper tackled the problem of incorporating phonetic feedback into speech enhancement systems, achieving gains in objective intelligibility scores on CHiME-4 data, even without parallel speech data.
While deep learning systems have gained significant ground in speech enhancement research, these systems have yet to make use of the full potential of deep learning systems to provide high-level feedback. In particular, phonetic feedback is rare in speech enhancement research even though it includes valuable top-down information. We use the technique of mimic loss to provide phonetic feedback to an off-the-shelf enhancement system, and find gains in objective intelligibility scores on CHiME-4 data. This technique takes a frozen acoustic model trained on clean speech to provide valuable feedback to the enhancement model, even in the case where no parallel speech data is available. Our work is one of the first to show intelligibility improvement for neural enhancement systems without parallel speech data, and we show phonetic feedback can improve a state-of-the-art neural enhancement system trained with parallel speech data.