SDLGMMASFeb 4, 2022

Polyphonic pitch detection with convolutional recurrent neural networks

arXiv:2202.02115v18 citations
Originality Incremental advance
AI Analysis

This work addresses automatic music transcription for polyphonic audio, offering a novel deep learning approach that is incremental in applying techniques from speech recognition to music.

The paper tackled polyphonic pitch detection by developing an online system that streams audio to MIDI using ConvLSTMs, achieving state-of-the-art results with an F-measure of 83% on a specific ensemble recording without relying on musical language modeling or instrument timbre assumptions.

Recent directions in automatic speech recognition (ASR) research have shown that applying deep learning models from image recognition challenges in computer vision is beneficial. As automatic music transcription (AMT) is superficially similar to ASR, in the sense that methods often rely on transforming spectrograms to symbolic sequences of events (e.g. words or notes), deep learning should benefit AMT as well. In this work, we outline an online polyphonic pitch detection system that streams audio to MIDI by ConvLSTMs. Our system achieves state-of-the-art results on the 2007 MIREX multi-F0 development set, with an F-measure of 83\% on the bassoon, clarinet, flute, horn and oboe ensemble recording without requiring any musical language modelling or assumptions of instrument timbre.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes