CL CVFeb 27, 2020

Analysis of diversity-accuracy tradeoff in image captioning

arXiv:2002.11848v11.615 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the tradeoff between diversity and accuracy in image captioning, which is important for applications like accessibility and content generation, but it is incremental as it builds on existing methods and metrics.

The study examined how various factors affect diversity and accuracy in image captioning, finding that simple sampling with low temperature yields competitive diversity and accuracy, while CIDEr-based reinforcement learning reduces diversity. They also introduced AllSPICE, a new metric for evaluating both aspects in a single value.

We investigate the effect of different model architectures, training objectives, hyperparameter settings and decoding procedures on the diversity of automatically generated image captions. Our results show that 1) simple decoding by naive sampling, coupled with low temperature is a competitive and fast method to produce diverse and accurate caption sets; 2) training with CIDEr-based reward using Reinforcement learning harms the diversity properties of the resulting generator, which cannot be mitigated by manipulating decoding parameters. In addition, we propose a new metric AllSPICE for evaluating both accuracy and diversity of a set of captions by a single value.

View on arXiv PDF Code

Similar