One Shot Learning for Speech Separation
This work aims to improve the generalization of speech separation models for new speakers and noisy environments, which is an incremental improvement for speech processing researchers.
This paper addresses the challenge of speech separation models failing in new speaker or noisy environments by applying meta-learning. The authors propose a meta-initialization model that adapts to new speakers by observing only one mixture, demonstrating adaptation to both new speakers and noisy environments.
Despite the recent success of speech separation models, they fail to separate sources properly while facing different sets of people or noisy environments. To tackle this problem, we proposed to apply meta-learning to the speech separation task. We aimed to find a meta-initialization model, which can quickly adapt to new speakers by seeing only one mixture generated by those people. In this paper, we use model-agnostic meta-learning(MAML) algorithm and almost no inner loop(ANIL) algorithm in Conv-TasNet to achieve this goal. The experiment results show that our model can adapt not only to a new set of speakers but also noisy environments. Furthermore, we found out that the encoder and decoder serve as the feature-reuse layers, while the separator is the task-specific module.