CV AISep 2, 2023

A Fine-Grained Image Description Generation Method Based on Joint Objectives

Yifan Zhang, Chunzhen Lin, Donglin Cao, Dazhen Lin

arXiv:2311.12799v11.5

Originality Incremental advance

AI Analysis

This work addresses a specific problem in computer vision for generating more accurate and detailed image descriptions, representing an incremental advancement in the field.

The paper tackles the challenges of description repetition and omission in fine-grained image captioning by proposing a model that combines image-level and object-level visual features with an object penalty mechanism, resulting in significant improvements in the CIDEr evaluation metric.

The goal of fine-grained image description generation techniques is to learn detailed information from images and simulate human-like descriptions that provide coherent and comprehensive textual details about the image content. Currently, most of these methods face two main challenges: description repetition and omission. Moreover, the existing evaluation metrics cannot clearly reflect the performance of models on these two issues. To address these challenges, we propose an innovative Fine-grained Image Description Generation model based on Joint Objectives. Furthermore, we introduce new object-based evaluation metrics to more intuitively assess the model's performance in handling description repetition and omission. This novel approach combines visual features at both the image level and object level to maximize their advantages and incorporates an object penalty mechanism to reduce description repetition. Experimental results demonstrate that our proposed method significantly improves the CIDEr evaluation metric, indicating its excellent performance in addressing description repetition and omission issues.

View on arXiv PDF

Similar