CVDLJun 23, 2025

LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR

UW
arXiv:2506.19065v21 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses the problem of automating music score digitization for musicians and archivists, representing a significant advance over prior methods.

The authors tackled optical music recognition (OMR) by proposing Legato, an end-to-end model that converts music score images to machine-readable ABC notation, achieving a 68% and 47.6% absolute error reduction on TEDn and OMR-NED metrics.

We propose Legato, a new end-to-end model for optical music recognition (OMR), a task of converting music score images to machine-readable documents. Legato is the first large-scale pretrained OMR model capable of recognizing full-page or multi-page typeset music scores and the first to generate documents in ABC notation, a concise, human-readable format for symbolic music. Bringing together a pretrained vision encoder with an ABC decoder trained on a dataset of more than 214K images, our model exhibits the strong ability to generalize across various typeset scores. We conduct comprehensive experiments on a range of datasets and metrics and demonstrate that Legato outperforms the previous state of the art. On our most realistic dataset, we see a 68\% and 47.6\% absolute error reduction on the standard metrics TEDn and OMR-NED, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes