SDJun 15, 2021

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

arXiv:2106.07886v34 citations
Originality Highly original
AI Analysis

This work addresses the problem of slow synthesis speed for Korean singing voice synthesis, offering a significant speed improvement for practical applications.

The paper tackles the slow inference speed of neural singing voice synthesis systems by proposing MLP Singer, a parallel Korean singing voice synthesis system based on an MLP architecture, achieving real-time factors of up to 200 on CPUs and 3400 on GPUs while outperforming a larger autoregressive GAN-based system in audio quality.

Recent developments in deep learning have significantly improved the quality of synthesized singing voice audio. However, prominent neural singing voice synthesis systems suffer from slow inference speed due to their autoregressive design. Inspired by MLP-Mixer, a novel architecture introduced in the vision literature for attention-free image classification, we propose MLP Singer, a parallel Korean singing voice synthesis system. To the best of our knowledge, this is the first work that uses an entirely MLP-based architecture for voice synthesis. Listening tests demonstrate that MLP Singer outperforms a larger autoregressive GAN-based system, both in terms of audio quality and synthesis speed. In particular, MLP Singer achieves a real-time factor of up to 200 and 3400 on CPUs and GPUs respectively, enabling order of magnitude faster generation on both environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes