MMCVSDASMar 25, 2022

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

arXiv:2203.13535v13 citationsh-index: 98
Originality Incremental advance
AI Analysis

This work addresses a more general and challenging scenario in visual sound separation for applications in audio-visual processing, though it is incremental as it builds on existing deep learning approaches.

The paper tackles the problem of separating unknown musical instrument sounds from visual inputs by proposing the SeCo framework, which uses consistency constraints and an online matching strategy to achieve significant performance improvements over baseline methods.

Recent years have witnessed the success of deep learning on the visual sound separation task. However, existing works follow similar settings where the training and testing datasets share the same musical instrument categories, which to some extent limits the versatility of this task. In this work, we focus on a more general and challenging scenario, namely the separation of unknown musical instruments, where the categories in training and testing phases have no overlap with each other. To tackle this new setting, we propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories by exploiting the consistency constraints. Furthermore, to capture richer characteristics of the novel melodies, we devise an online matching strategy, which can bring stable enhancements with no cost of extra parameters. Experiments demonstrate that our SeCo framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes