ASLGSDAug 11, 2020

Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

arXiv:2008.04574v127 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for low-cost neural TTS systems, particularly for mobile devices, by making incremental improvements to an existing efficient vocoder.

The paper tackled the problem of high computational complexity in neural text-to-speech systems by proposing two techniques, sample-bunching and bit-bunching, to reduce the complexity of the LPCNet vocoder. The result was a 2.19x improvement in run-time on a mobile device with less than a 0.1 decrease in mean opinion score.

LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes