AS LG SDDec 15, 2023

Toward Deep Drum Source Separation

Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti

arXiv:2312.09663v33.39 citationsh-index: 8Has CodePattern Recognition Letters

Originality Incremental advance

AI Analysis

This addresses data scarcity in drum separation for audio processing applications, representing a domain-specific advancement.

The paper tackles the problem of drum source separation by introducing StemGMD, a large-scale dataset of 1224 hours of isolated drum stems, and LarsNet, a deep learning model that separates five stems from stereo mixtures faster than real-time, significantly outperforming state-of-the-art nonnegative spectro-temporal factorization methods.

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.

View on arXiv PDF Code

Similar