Understanding Optical Music Recognition
This tutorial makes OMR more accessible for new researchers, especially those without a musical background, by clarifying terminology and structure, though it is incremental as it builds on existing work.
The paper addresses the accessibility and definitional challenges in Optical Music Recognition (OMR) by providing a robust definition, analyzing its inversion of music encoding, and proposing a novel taxonomy of applications, aiming to help readers understand OMR's objectives, structure, and state of the art.
For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: few introductory materials are available, and furthermore the field has struggled with defining itself and building a shared terminology. In this tutorial, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords.