NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
This work addresses the problem of improving zero-shot image captioning for the computer vision community, but it is incremental as it focuses on evaluation and challenge outcomes rather than a novel method.
The paper introduced the NICE project and its 2023 challenge, which tackled the problem of developing robust zero-shot image captioning models by testing them on a new evaluation dataset with diverse visual concepts, resulting in advancements in accuracy and fairness without specific training data.
In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks.