RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection
This addresses the challenge of integrated music performance analysis for researchers and musicians, though it is incremental in combining existing tasks.
The study tackled the problem of analyzing music performances by introducing RUMAA, a transformer-based framework that unifies score-to-performance alignment, transcription, and mistake detection, which matches state-of-the-art alignment on non-repeated scores and outperforms them on scores with repeats in a public piano music dataset.
This study introduces RUMAA, a transformer-based framework for music performance analysis that unifies score-to-performance alignment, score-informed transcription, and mistake detection in a near end-to-end manner. Unlike prior methods addressing these tasks separately, RUMAA integrates them using pre-trained score and audio encoders and a novel tri-stream decoder capturing task interdependencies through proxy tasks. It aligns human-readable MusicXML scores with repeat symbols to full-length performance audio, overcoming traditional MIDI-based methods that rely on manually unfolded score-MIDI data with pre-specified repeat structures. RUMAA matches state-of-the-art alignment methods on non-repeated scores and outperforms them on scores with repeats in a public piano music dataset, while also delivering promising transcription and mistake detection results.