SDASNov 13, 2017

Audio-to-score alignment of piano music using RNN-based automatic music transcription

arXiv:1711.04480v123 citations
Originality Incremental advance
AI Analysis

This work addresses alignment accuracy for piano music analysis, representing an incremental improvement over prior methods.

The paper tackles audio-to-score alignment for piano music by using RNN-based automatic music transcription as a feature extractor, achieving a mean onset error of less than 10 ms on the MAPS dataset.

We propose a framework for audio-to-score alignment on piano performance that employs automatic music transcription (AMT) using neural networks. Even though the AMT result may contain some errors, the note prediction output can be regarded as a learned feature representation that is directly comparable to MIDI note or chroma representation. To this end, we employ two recurrent neural networks that work as the AMT-based feature extractors to the alignment algorithm. One predicts the presence of 88 notes or 12 chroma in frame-level and the other detects note onsets in 12 chroma. We combine the two types of learned features for the audio-to-score alignment. For comparability, we apply dynamic time warping as an alignment algorithm without any additional post-processing. We evaluate the proposed framework on the MAPS dataset and compare it to previous work. The result shows that the alignment framework with the learned features significantly improves the accuracy, achieving less than 10 ms in mean onset error.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes