CVSDASApr 11, 2019

The Sound of Motions

arXiv:1904.05979v1271 citations
Originality Incremental advance
AI Analysis

This work solves the problem of sound separation for audio-visual applications, offering incremental improvements over appearance-based methods.

The paper tackles sound localization and separation by using motion cues from videos, improving performance in separating musical instrument sounds and addressing duets of the same instrument category.

Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact that humans is capable of interpreting sound sources from how objects move visually, we propose a novel system that explicitly captures such motion cues for the task of sound localization and separation. Our system is composed of an end-to-end learnable model called Deep Dense Trajectory (DDT), and a curriculum learning scheme. It exploits the inherent coherence of audio-visual signals from a large quantities of unlabeled videos. Quantitative and qualitative evaluations show that comparing to previous models that rely on visual appearance cues, our motion based system improves performance in separating musical instrument sounds. Furthermore, it separates sound components from duets of the same category of instruments, a challenging problem that has not been addressed before.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes