SDCVApr 22

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

arXiv:2604.205220.04
AI Analysis50

This work addresses the challenge of creating editable and verifiable music scores from images for musicians and researchers, but it is incremental as it builds on existing two-stage pipelines.

The paper tackles the problem of converting visual music notation into structured scores by proposing a two-stage decoding approach for complex polyphonic Optical Music Recognition, focusing on voice separation and timing bottlenecks, and achieves a practical decoding component for real systems.

We propose a new approach for the second stage of a practical two-stage Optical Music Recognition (OMR) pipeline. Given symbol and event candidates from the visual pipeline, we decode them into an editable, verifiable, and exportable score structure. We focus on complex polyphonic staff notation, especially piano scores, where voice separation and intra-measure timing are the main bottlenecks. Our approach formulates second-stage decoding as a structure decoding problem and uses topology recognition with probability-guided search (BeadSolver) as its core method. We also describe a data strategy that combines procedural generation with recognition-feedback annotations. The result is a practical decoding component for real OMR systems and a path to accumulate structured score data for future end-to-end, multimodal, and RL-style methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes