CVMar 6, 2020

Captioning Images with Novel Objects via Online Vocabulary Expansion

arXiv:2003.03305v12 citations
AI Analysis

This addresses the challenge of reducing data collection and retraining costs for image captioning systems when dealing with novel objects, though it appears incremental as it builds on existing models.

The paper tackles the problem of generating image captions containing novel objects without costly retraining, by proposing a method that uses word embeddings estimated from a small number of image features. The results demonstrate the effectiveness of this approach in integrating with general image-captioning models.

In this study, we introduce a low cost method for generating descriptions from images containing novel objects. Generally, constructing a model, which can explain images with novel objects, is costly because of the following: (1) collecting a large amount of data for each category, and (2) retraining the entire system. If humans see a small number of novel objects, they are able to estimate their properties by associating their appearance with known objects. Accordingly, we propose a method that can explain images with novel objects without retraining using the word embeddings of the objects estimated from only a small number of image features of the objects. The method can be integrated with general image-captioning models. The experimental results show the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes