CVJul 12, 2016

Weakly Supervised Learning of Heterogeneous Concepts in Videos

Sohil Shah, Kuldeep Kulkarni, Arijit Biswas, Ankit Gandhi, Om Deshmukh, Larry Davis

arXiv:1607.03240v13.02 citations

Originality Incremental advance

AI Analysis

This work addresses video analysis for applications like content retrieval by improving weakly supervised learning, though it is incremental as it builds on existing methods like the Indian Buffet Process.

The paper tackles the problem of classifying and localizing heterogeneous concepts in videos using weak textual descriptions, achieving a 24% relative improvement for pairwise concept classification on the Casablanca dataset and a 9% relative improvement for localization on the A2D dataset compared to baselines.

Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24% relative improvement for pairwise concept classification in the Casablanca dataset and 9% relative improvement for localization in the A2D dataset as compared to the most competitive baseline.

View on arXiv PDF

Similar