CVMar 28, 2022

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

Duc Minh Vo, Hong Chen, Akihiro Sugimoto, Hideki Nakayama

arXiv:2203.14499v110.621 citationsh-index: 29

Originality Incremental advance

AI Analysis

This addresses the problem of describing unseen objects in image captioning for AI applications, offering an incremental improvement by replacing detection with retrieval.

The paper tackles novel object captioning by retrieving vocabulary from external knowledge like Wiktionary embeddings, eliminating the need for object detection models and enabling description of objects absent from training data. Results show it is considerably effective against state-of-the-art methods on COCO and Nocaps datasets.

Novel object captioning aims at describing objects absent from training data, with the key ingredient being the provision of object vocabulary to the model. Although existing methods heavily rely on an object detection model, we view the detection step as vocabulary retrieval from an external knowledge in the form of embeddings for any object's definition from Wiktionary, where we use in the retrieval image region features learned from a transformers model. We propose an end-to-end Novel Object Captioning with Retrieved vocabulary from External Knowledge method (NOC-REK), which simultaneously learns vocabulary retrieval and caption generation, successfully describing novel objects outside of the training dataset. Furthermore, our model eliminates the requirement for model retraining by simply updating the external knowledge whenever a novel object appears. Our comprehensive experiments on held-out COCO and Nocaps datasets show that our NOC-REK is considerably effective against SOTAs.

View on arXiv PDF

Similar