CVSep 5, 2023

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu

NVIDIAU of Toronto

arXiv:2309.01961v39.111 citationsh-index: 88

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of improving zero-shot image captioning for the computer vision community, but it is incremental as it focuses on evaluation and challenge outcomes rather than a novel method.

The paper introduced the NICE project and its 2023 challenge, which tackled the problem of developing robust zero-shot image captioning models by testing them on a new evaluation dataset with diverse visual concepts, resulting in advancements in accuracy and fairness without specific training data.

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks.

View on arXiv PDF

Similar