Handling Background Noise in Neural Speech Generation
This work addresses a practical limitation for speech coding applications, but it is incremental as it builds on existing neural generative models.
The paper tackled the problem of neural speech generation models performing poorly with noisy inputs by investigating the cause and evaluating methods to address it, finding that adding a denoising preprocessing stage during feature extraction and training on clean speech yields the best results.
Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.