CVAIDec 20, 2024

Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation

arXiv:2412.15939v23 citationsh-index: 6WACV
Originality Incremental advance
AI Analysis

This work addresses the problem of detecting and describing fine-grained differences in complex images for applications like content verification, but it is incremental as it builds on existing models and methods.

The paper tackles the challenge of describing differences between real-world images by adapting an existing image captioning model to the Image Difference Captioning task and using synthetic data augmentation, resulting in BLIP2IDC outperforming two-stream approaches by a significant margin and creating a new dataset Syned1.

The rise of the generative models quality during the past years enabled the generation of edited variations of images at an important scale. To counter the harmful effects of such technology, the Image Difference Captioning (IDC) task aims to describe the differences between two images. While this task is successfully handled for simple 3D rendered images, it struggles on real-world images. The reason is twofold: the training data-scarcity, and the difficulty to capture fine-grained differences between complex images. To address those issues, we propose in this paper a simple yet effective framework to both adapt existing image captioning models to the IDC task and augment IDC datasets. We introduce BLIP2IDC, an adaptation of BLIP2 to the IDC task at low computational cost, and show it outperforms two-streams approaches by a significant margin on real-world IDC datasets. We also propose to use synthetic augmentation to improve the performance of IDC models in an agnostic fashion. We show that our synthetic augmentation strategy provides high quality data, leading to a challenging new dataset well-suited for IDC named Syned1.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes