SD LG ASDec 18, 2017

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

Santiago Pascual, Maruchan Park, Joan Serrà, Antonio Bonafonte, Kang-Hun Ahn

arXiv:1712.06340v110.59 citations

Originality Incremental advance

AI Analysis

This addresses the problem of low-resource speech enhancement adaptation for languages like Catalan and Korean, though it is incremental as it builds on existing GAN methods.

The study tackled adapting a speech enhancement GAN to new languages and noise conditions with minimal data, finding that fine-tuning with just 10 minutes of data achieved performance comparable to using 100 times more data, and performance remained stable with varying noise types.

Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training. Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types.

View on arXiv PDF

Similar