Audio-to-Score Alignment Using Deep Automatic Music Transcription
This work addresses the problem of precise music synchronization for applications like music education and analysis, though it appears incremental by building on existing transcription and alignment techniques.
The paper tackled audio-to-score alignment at the note-level by exploiting deep automatic music transcription and HMM-based methods, achieving a remarkable advancement beyond the state-of-the-art as confirmed by extensive tests on multiple datasets.
Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.