ASSDOct 5, 2021

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

arXiv:2110.02360v119 citations
Originality Incremental advance
AI Analysis

This work addresses audio editing for applications like speech manipulation and singing voice synthesis, but it is incremental as it builds upon the LPCNet vocoder.

The paper tackled the problem of pitch-shifting and time-stretching audio signals, which often suffer from artifacts in existing methods, by proposing Controllable LPCNet (CLPCNet). The result showed that CLPCNet achieves high accuracy in objective evaluations and meets or exceeds competitive methods in subjective quality and naturalness on unseen datasets.

Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis. Thus far, methods for pitch-shifting and time-stretching that use digital signal processing (DSP) have been favored over deep learning approaches due to their speed and relatively higher quality. However, even existing DSP-based methods for pitch-shifting and time-stretching induce artifacts that degrade audio quality. In this paper, we propose Controllable LPCNet (CLPCNet), an improved LPCNet vocoder capable of pitch-shifting and time-stretching of speech. For objective evaluation, we show that CLPCNet performs pitch-shifting of speech on unseen datasets with high accuracy relative to prior neural methods. For subjective evaluation, we demonstrate that the quality and naturalness of pitch-shifting and time-stretching with CLPCNet on unseen datasets meets or exceeds competitive neural- or DSP-based approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes