Single-channel speech enhancement using learnable loss mixup
This work addresses generalization issues in speech enhancement, which is important for improving audio quality in applications like communication and hearing aids, though it appears incremental as it builds on existing loss mixup techniques.
The paper tackles the generalization problem in supervised learning for single-channel speech enhancement by proposing learnable loss mixup (LLM), a training method that optimizes a mixture of loss functions for random sample pairs. On the VCTK benchmark, it achieves a PESQ score of 3.26, outperforming state-of-the-art methods.
Generalization remains a major problem in supervised learning of single-channel speech enhancement. In this work, we propose learnable loss mixup (LLM), a simple and effortless training diagram, to improve the generalization of deep learning-based speech enhancement models. Loss mixup, of which learnable loss mixup is a special variant, optimizes a mixture of the loss functions of random sample pairs to train a model on virtual training data constructed from these pairs of samples. In learnable loss mixup, by conditioning on the mixed data, the loss functions are mixed using a non-linear mixing function automatically learned via neural parameterization. Our experimental results on the VCTK benchmark show that learnable loss mixup achieves 3.26 PESQ, outperforming the state-of-the-art.