CVNENov 9, 2018

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

arXiv:1811.04000v116 citations
Originality Incremental advance
AI Analysis

This work addresses audiovisual object extraction in large video collections using weak supervision, which is incremental as it builds upon a previous framework.

The paper tackles audiovisual scene analysis for weakly-labeled video data by integrating audio source enhancement into an existing framework, achieving object classification in noisy acoustic environments and showing encouraging visual object localization results on a music instrument performance dataset.

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes