MMCVJul 19, 2018

Few-Shot Adaptation for Multimedia Semantic Indexing

arXiv:1807.07203v17 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust multimedia indexing with limited labeled data, which is incremental as it builds upon existing zero-shot and few-shot learning methods.

The paper tackles the problem of semantic indexing for image and video data by proposing a few-shot adaptation framework that bridges zero-shot and supervised many-shot learning, achieving state-of-the-art results such as 35.98% Mean Average Precision on the TRECVID 2014 dataset under supervised conditions.

We propose a few-shot adaptation framework, which bridges zero-shot learning and supervised many-shot learning, for semantic indexing of image and video data. Few-shot adaptation provides robust parameter estimation with few training examples, by optimizing the parameters of zero-shot learning and supervised many-shot learning simultaneously. In this method, first we build a zero-shot detector, and then update it by using the few examples. Our experiments show the effectiveness of the proposed framework on three datasets: TRECVID Semantic Indexing 2010, 2014, and ImageNET. On the ImageNET dataset, we show that our method outperforms recent few-shot learning methods. On the TRECVID 2014 dataset, we achieve 15.19% and 35.98% in Mean Average Precision under the zero-shot condition and the supervised condition, respectively. To the best of our knowledge, these are the best results on this dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes