CLSDASAug 18, 2023

TrOMR:Transformer-Based Polyphonic Optical Music Recognition

arXiv:2308.09370v111 citationsh-index: 63
Originality Incremental advance
AI Analysis

This addresses the problem of accurately recognizing complex music scores for musicians and researchers, representing an incremental improvement over existing methods.

The paper tackles polyphonic optical music recognition by proposing TrOMR, a transformer-based end-to-end approach with a novel consistency loss and data annotation method, which outperforms current methods in real-world scenarios.

Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes