CVSep 19, 2019

ContCap: A scalable framework for continual image captioning

Giang Nguyen, Tae Joon Jun, Trung Tran, Tolcha Yalew, Daeyoung Kim

arXiv:1909.08745v210.213 citations

Originality Incremental advance

AI Analysis

This work addresses the incremental problem of integrating continual learning into image captioning for researchers and practitioners in computer vision.

The paper tackles the problem of catastrophic forgetting in image captioning by introducing a continual learning framework, ContCap, which shows significant improvements in performance on old tasks and surpasses fine-tuning on new tasks.

While advanced image captioning systems are increasingly describing images coherently and exactly, recent progress in continual learning allows deep learning models to avoid catastrophic forgetting. However, the domain where image captioning working with continual learning has not yet been explored. We define the task in which we consolidate continual learning and image captioning as continual image captioning. In this work, we propose ContCap, a framework generating captions over a series of new tasks coming, seamlessly integrating continual learning into image captioning besides addressing catastrophic forgetting. After proving forgetting in image captioning, we propose various techniques to overcome the forgetting dilemma by taking a simple fine-tuning schema as the baseline. We split MS-COCO 2014 dataset to perform experiments in class-incremental settings without revisiting dataset of previously provided tasks. Experiments show remarkable improvements in the performance on the old tasks while the figures for the new surprisingly surpass fine-tuning. Our framework also offers a scalable solution for continual image or video captioning.

View on arXiv PDF

Similar