AS LG SDOct 22, 2019

Simultaneous Separation and Transcription of Mixtures with Multiple Polyphonic and Percussive Instruments

Ethan Manilow, Prem Seetharaman, Bryan Pardo

arXiv:1910.12621v216.449 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of analyzing complex audio recordings for music processing applications, though it is incremental as it builds on existing source separation methods.

The paper tackles the problem of simultaneously separating and transcribing musical mixtures with multiple instruments, resulting in a single deep learning architecture (Cerberus) that improves both tasks and generalizes better to unseen data.

We present a single deep learning architecture that can both separate an audio recording of a musical mixture into constituent single-instrument recordings and transcribe these instruments into a human-readable format at the same time, learning a shared musical representation for both tasks. This novel architecture, which we call Cerberus, builds on the Chimera network for source separation by adding a third "head" for transcription. By training each head with different losses, we are able to jointly learn how to separate and transcribe up to 5 instruments in our experiments with a single network. We show that the two tasks are highly complementary with one another and when learned jointly, lead to Cerberus networks that are better at both separation and transcription and generalize better to unseen mixtures.

View on arXiv PDF

Similar