CVAIAug 31, 2024

Toward a More Complete OMR Solution

arXiv:2409.00316v13 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses a gap in optical music recognition for digitizing music notation, though it is incremental as it builds on existing multi-stage pipelines.

The paper tackles the problem of optical music recognition by jointly considering object detection and notation assembly stages, showing that their model outperforms existing models trained on perfect detection output.

Optical music recognition (OMR) aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image (object detection) and then assembles them into a music notation (notation assembly). Most previous work on notation assembly unrealistically assumes perfect object detection. In this study, we focus on the MUSCIMA++ v2.0 dataset, which represents musical notation as a graph with pairwise relationships among detected music objects, and we consider both stages together. First, we introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output. We find that this model is able to outperform existing models trained on perfect detection output, showing the benefit of considering the detection and assembly stages in a more holistic way. These findings, together with our novel evaluation metric, are important steps toward a more complete OMR solution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes