SDAILGMMASMar 15, 2024

MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage

arXiv:2403.10024v13 citationsh-index: 15
Originality Incremental advance
AI Analysis

This addresses fragmentation issues in automatic music transcription for applications like music analysis, though it is incremental as it builds on an existing SOTA model.

The paper tackled instrument leakage in multi-instrument music transcription by enhancing the MT3 model with memory retention and other techniques, resulting in improved onset F1 scores and reduced leakage on the Slakh2100 dataset.

This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, and token shuffling are proposed. These methods are evaluated on the Slakh2100 dataset, demonstrating improved onset F1 scores and reduced instrument leakage. In addition to the conventional multi-instrument transcription F1 score, new metrics such as the instrument leakage ratio and the instrument detection F1 score are introduced for a more comprehensive assessment of transcription quality. The study also explores the issue of domain overfitting by evaluating MT3 on single-instrument monophonic datasets such as ComMU and NSynth. The findings, along with the source code, are shared to facilitate future work aimed at refining token-based multi-instrument AMT models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes