ASCLLGSDJul 9, 2024

Remastering Divide and Remaster: A Cinematic Audio Source Separation Dataset with Multilingual Support

arXiv:2407.07275v26 citationsh-index: 9Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses a data bottleneck for researchers in audio source separation by providing a more diverse and improved dataset, though it is incremental as it builds on an existing dataset.

The authors tackled the lack of linguistic diversity in cinematic audio source separation datasets by developing DnR v3, which includes speech from over 30 languages, and found that training on this multilingual data significantly improves model generalizability, with performance on par or better than monolingual models.

Cinematic audio source separation (CASS), as a problem of extracting the dialogue, music, and effects stems from their mixture, is a relatively new subtask of audio source separation. To date, only one publicly available dataset exists for CASS, that is, the Divide and Remaster (DnR) dataset, which is currently at version 2. While DnR v2 has been an incredibly useful resource for CASS, several areas of improvement have been identified, particularly through its use in the 2023 Sound Demixing Challenge. In this work, we develop version 3 of the DnR dataset, addressing issues relating to vocal content in non-dialogue stems, loudness distributions, mastering process, and linguistic diversity. In particular, the dialogue stem of DnR v3 includes speech content from more than 30 languages from multiple families including but not limited to the Germanic, Romance, Indo-Aryan, Dravidian, Malayo-Polynesian, and Bantu families. Benchmark results using the Bandit model indicated that training on multilingual data yields significant generalizability to the model even in languages with low data availability. Even in languages with high data availability, the multilingual model often performs on par or better than dedicated models trained on monolingual CASS datasets. Dataset and model implementation will be made available at https://github.com/kwatcharasupat/source-separation-landing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes