SDMMMay 17

A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport

arXiv:2605.1740516.7
Predicted impact top 84% in SD · last 90 daysOriginality Highly original
AI Analysis

For researchers in music transcription, this work introduces a new paradigm that improves onset detection by handling temporal misalignment.

The paper formalizes automatic piano transcription as an optimal transport problem, achieving state-of-the-art onset detection on the MAESTRO dataset.

This paper describes a novel paradigm that formalizes automatic piano transcription (APT) as an optimal transport (OT) problem, not as a frame-level multi-label binary classification problem. Our method learns to minimize the cost of transporting a predicted distribution of note events to the ground-truth distribution over time and frequency. The OT loss can thus accommodate temporal misalignment, leading to perceptually relevant optimization. We also propose a convolutional recurrent neural network (CRNN) with a harmonics-aware attention mechanism to capture the spectro-temporal dependencies inherent in music.Our experiments using the MAESTRO dataset showed that our method attained a state-of-the-art performance in onset detection. We confirmed the versatility of the OT loss in application to existing models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes