Evaluating Non-aligned Musical Score Transcriptions with MV2H
This work addresses a specific bottleneck in music transcription evaluation for researchers and developers, but it is incremental as it extends an existing metric rather than proposing a new paradigm.
The paper tackled the problem of evaluating musical score transcriptions that lack time alignment with the input, by introducing an automatic alignment method based on dynamic time warp to enable the use of the MV2H metric for such non-aligned cases, which also allows leveraging widely available non-aligned scores as ground truth.
The original MV2H metric was designed to evaluate systems which transcribe from an input audio (or MIDI) piece to a complete musical score. However, it requires both the transcribed score and the ground truth score to be time-aligned with the input. Some recent work has begun to transcribe directly from an audio signal into a musical score, skipping the alignment step. This paper introduces an automatic alignment method based on dynamic time warp which allows for MV2H to be used to evaluate such non-aligned transcriptions. This has the additional benefit of allowing non-aligned musical scores---which are significantly more widely available than aligned ones---to be used as ground truth. The code for the improved MV2H, which now also includes a MusicXML parser, and allows for key and time signature changes, is available at www.github.com/apmcleod/MV2H.