CVIVMar 21, 2020

Appearance Fusion of Multiple Cues for Video Co-localization

arXiv:2003.09556v2
AI Analysis

This work addresses object discovery in videos for computer vision applications, representing an incremental improvement over existing fusion strategies.

The paper tackles video co-localization by proposing an appearance fusion method that combines multiple object cues into a single Gaussian Mixture Model, guided by reliability and consensus, and shows it outperforms spatial fusion and state-of-the-art methods on YouTube datasets.

This work addresses the joint object discovery problem in videos while utilizing multiple object-related cues. In contrast to the usual spatial fusion approach, a novel appearance fusion approach is presented here. Specifically, this paper proposes an effective fusion process of different GMMs derived from multiple cues into one GMM. Much the same as any fusion strategy, this approach also needs some guidance. The proposed method relies on reliability and consensus phenomenon for guidance. As a case study, we pursue the "video co-localization" object discovery problem to propose our methodology. Our experiments on YouTube Objects and YouTube Co-localization datasets demonstrate that the proposed method of appearance fusion undoubtedly has an advantage over both the spatial fusion strategy and the current state-of-the-art video co-localization methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes