SD ASMay 7, 2018

A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in Pop Music

Sanna Wager, Lijiang Guo, Aswin Sivaraman, Minje Kim

arXiv:1805.02603v12.91 citations

Originality Incremental advance

AI Analysis

This addresses pitch correction for karaoke and pop music production, offering a more nuanced alternative to existing real-time methods, though it is incremental as it builds on prior pitch correction techniques.

The paper tackles pitch correction for singing voice in pop music by using a machine-learning approach that predicts continuous pitch-shifting from vocal and accompaniment tracks, trained on semi-professional singing data, resulting in a method that preserves expressive features like pitch bending and vibrato.

In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on examples of semi-professional singing. The proposed approach differs from existing real-time pitch correction methods by replacing pitch tracking and mapping to a discrete set of notes---for example, the twelve classes of the equal-tempered scale---with learning a correction that is continuous both in frequency and in time directly from the harmonics of the vocal and accompaniment tracks. A Recurrent Neural Network (RNN) model provides a correction that takes context into account, preserving expressive pitch bending and vibrato. This method can be extended into unsupervised pitch correction of a vocal performance---popularly referred to as autotuning.

View on arXiv PDF

Similar