ASSDOct 23, 2020

Toward Expressive Singing Voice Correction: On Perceptual Validity of Evaluation Metrics for Vocal Melody Extraction

arXiv:2010.12196v11 citations
Originality Incremental advance
AI Analysis

This addresses the need for more effective automated tools for amateur singers, though it is incremental as it builds on previous work by integrating advanced techniques.

The paper tackled the problem of automating expressive singing voice correction (SVC) for both pitch and rhythmic errors, finding that standard metrics for vocal melody extraction yield high pitch accuracy but do not correlate well with perceptual scores.

Singing voice correction (SVC) is an appealing application for amateur singers. Commercial products automate SVC by snapping pitch contours to equal-tempered scales, which could lead to deadpan modifications. Together with the neglect of rhythmic errors, extensive manual corrections are still necessary. In this paper, we present a streamlined system to automate expressive SVC for both pitch and rhythmic errors. Particularly, we extend a previous work by integrating advanced techniques for singing voice separation (SVS) and vocal melody extraction. SVC is achieved by temporally aligning the source-target pair, followed by replacing pitch and rhythm of the source with those of the target. We evaluate the framework by a comparative study for melody extraction which involves both subjective and objective evaluations, whereby we investigate perceptual validity of the standard metrics through the lens of SVC. The results suggest that the high pitch accuracy obtained by the metrics does not signify good perceptual scores.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes