An Experimental Analysis of the Entanglement Problem in Neural-Network-based Music Transcription Systems
This identifies a key bottleneck for researchers and developers in music transcription, highlighting an incremental issue that hinders progress in the field.
The paper investigates the 'entanglement problem' in neural-network-based polyphonic music transcription systems, finding that these networks learn specific note combinations and struggle to generalize to unseen ones, which limits further performance improvements.
Several recent polyphonic music transcription systems have utilized deep neural networks to achieve state of the art results on various benchmark datasets, pushing the envelope on framewise and note-level performance measures. Unfortunately we can observe a sort of glass ceiling effect. To investigate this effect, we provide a detailed analysis of the particular kinds of errors that state of the art deep neural transcription systems make, when trained and tested on a piano transcription task. We are ultimately forced to draw a rather disheartening conclusion: the networks seem to learn combinations of notes, and have a hard time generalizing to unseen combinations of notes. Furthermore, we speculate on various means to alleviate this situation.