CVSep 7, 2015

Structured Prediction with Output Embeddings for Semantic Image Annotation

arXiv:1509.02130v119 citations
Originality Incremental advance
AI Analysis

This addresses data sparsity in multi-class structured prediction for image annotation, but it is incremental as it builds on existing methods with specific adaptations.

The paper tackles the problem of semantic image annotation with sparse data across hundreds of classes by incorporating feature representations of both inputs and outputs into a factorized log-linear model, resulting in improved overall predictions.

We address the task of annotating images with semantic tuples. Solving this problem requires an algorithm which is able to deal with hundreds of classes for each argument of the tuple. In such contexts, data sparsity becomes a key challenge, as there will be a large number of classes for which only a few examples are available. We propose handling this by incorporating feature representations of both the inputs (images) and outputs (argument classes) into a factorized log-linear model, and exploiting the flexibility of scoring functions based on bilinear forms. Experiments show that integrating feature representations of the outputs in the structured prediction model leads to better overall predictions. We also conclude that the best output representation is specific for each type of argument.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes