CLCVJun 10, 2021

Data augmentation to improve robustness of image captioning solutions

arXiv:2106.05437v11 citations
Originality Synthesis-oriented
AI Analysis

This addresses robustness issues in image captioning for real-world applications, but is incremental as it applies known augmentation techniques to a specific flaw.

The paper tackled the problem of motion blur degrading image captioning performance by using data augmentation at object detection and captioning stages, reducing CIDEr-D degradation from 68.7 to 11.7 on MS COCO and from 22.4 to 6.8 on Vizwiz for high blur intensity.

In this paper, we study the impact of motion blur, a common quality flaw in real world images, on a state-of-the-art two-stage image captioning solution, and notice a degradation in solution performance as blur intensity increases. We investigate techniques to improve the robustness of the solution to motion blur using training data augmentation at each or both stages of the solution, i.e., object detection and captioning, and observe improved results. In particular, augmenting both the stages reduces the CIDEr-D degradation for high motion blur intensity from 68.7 to 11.7 on MS COCO dataset, and from 22.4 to 6.8 on Vizwiz dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes