CVSDASFeb 12, 2024

Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription

arXiv:2402.07596v228 citationsh-index: 4ICDAR
AI Analysis

This addresses the scalability and limitation issues in OMR for handling intricate music structures like polyphony, representing a significant but incremental advance over existing monophonic transcription techniques.

The paper tackles the problem of transcribing complex polyphonic musical scores in Optical Music Recognition (OMR) by introducing the Sheet Music Transformer, an end-to-end model that outperforms state-of-the-art methods on two polyphonic datasets.

State-of-the-art end-to-end Optical Music Recognition (OMR) has, to date, primarily been carried out using monophonic transcription techniques to handle complex score layouts, such as polyphony, often by resorting to simplifications or specific adaptations. Despite their efficacy, these approaches imply challenges related to scalability and limitations. This paper presents the Sheet Music Transformer, the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies. Our model employs a Transformer-based image-to-sequence framework that predicts score transcriptions in a standard digital music encoding format from input images. Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively. The experimental outcomes not only indicate the competence of the model, but also show that it is better than the state-of-the-art methods, thus contributing to advancements in end-to-end OMR transcription.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes