SDAILGASDec 8, 2020

I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch

arXiv:2012.04572v238 citations
AI Analysis

This work identifies a fundamental limitation in current audio-to-audio loss functions regarding pitch perception, which is a problem for researchers developing self-supervised audio learning models.

This paper evaluates common audio-to-audio loss functions on a synthetic benchmark designed to measure pitch distance between two stationary sinusoids. It finds that many of these losses exhibit a poor sense of pitch direction, failing at a task trivial for humans.

Growing research demonstrates that synthetic failure modes imply poor generalization. We compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids. The results are surprising: many have poor sense of pitch direction. These shortcomings are exposed using simple rank assumptions. Our task is trivial for humans but difficult for these audio distances, suggesting significant progress can be made in self-supervised audio learning by improving current losses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes