CVMay 21, 2018

Turbo Learning for Captionbot and Drawingbot

arXiv:1805.08170v226 citations
AI Analysis

This addresses the problem of enhancing multimodal AI tasks like image captioning and text-to-image generation for researchers and practitioners, though it appears incremental as it builds on existing methods with a novel joint training scheme.

The paper tackles image captioning and text-to-image generation by proposing a turbo learning approach that jointly trains a CaptionBot and DrawingBot, using a closed-loop feedback mechanism to improve performance, with experimental results on the COCO dataset showing significant improvements.

We study in this paper the problems of both image captioning and text-to-image generation, and present a novel turbo learning approach to jointly training an image-to-text generator (a.k.a. CaptionBot) and a text-to-image generator (a.k.a. DrawingBot). The key idea behind the joint training is that image-to-text generation and text-to-image generation as dual problems can form a closed loop to provide informative feedback to each other. Based on such feedback, we introduce a new loss metric by comparing the original input with the output produced by the closed loop. In addition to the old loss metrics used in CaptionBot and DrawingBot, this extra loss metric makes the jointly trained CaptionBot and DrawingBot better than the separately trained CaptionBot and DrawingBot. Furthermore, the turbo-learning approach enables semi-supervised learning since the closed loop can provide pseudo-labels for unlabeled samples. Experimental results on the COCO dataset demonstrate that the proposed turbo learning can significantly improve the performance of both CaptionBot and DrawingBot by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes