CVDec 22, 2015

Multi-Instance Visual-Semantic Embedding

arXiv:1512.06963v142 citations
Originality Incremental advance
AI Analysis

This addresses a general and open problem in computer vision for applications like image classification, though it appears incremental as it extends existing single-label embedding approaches to multi-label settings.

The paper tackles the problem of embedding images with multiple labels by proposing a Multi-Instance visual-semantic Embedding model (MIE) that maps image subregions to corresponding labels, demonstrating superiority over state-of-the-art methods in multi-label image annotation and zero-shot learning.

Visual-semantic embedding models have been recently proposed and shown to be effective for image classification and zero-shot learning, by mapping images into a continuous semantic label space. Although several approaches have been proposed for single-label embedding tasks, handling images with multiple labels (which is a more general setting) still remains an open problem, mainly due to the complex underlying corresponding relationship between image and its labels. In this work, we present Multi-Instance visual-semantic Embedding model (MIE) for embedding images associated with either single or multiple labels. Our model discovers and maps semantically-meaningful image subregions to their corresponding labels. And we demonstrate the superiority of our method over the state-of-the-art on two tasks, including multi-label image annotation and zero-shot learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes