A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport
For researchers in music transcription, this work introduces a new paradigm that improves onset detection by handling temporal misalignment.
The paper formalizes automatic piano transcription as an optimal transport problem, achieving state-of-the-art onset detection on the MAESTRO dataset.
This paper describes a novel paradigm that formalizes automatic piano transcription (APT) as an optimal transport (OT) problem, not as a frame-level multi-label binary classification problem. Our method learns to minimize the cost of transporting a predicted distribution of note events to the ground-truth distribution over time and frequency. The OT loss can thus accommodate temporal misalignment, leading to perceptually relevant optimization. We also propose a convolutional recurrent neural network (CRNN) with a harmonics-aware attention mechanism to capture the spectro-temporal dependencies inherent in music.Our experiments using the MAESTRO dataset showed that our method attained a state-of-the-art performance in onset detection. We confirmed the versatility of the OT loss in application to existing models.