Image Generation from Image Captioning -- Invertible Approach
This addresses the challenge of dual-task learning in computer vision with reduced training effort, though it appears incremental as it builds on existing invertible methods.
The authors tackled the problem of performing both image captioning and image generation using a single model trained only on captioning, by developing an invertible neural network that maps between image and text embeddings, enabling image generation through inversion without extra training.
Our work aims to build a model that performs dual tasks of image captioning and image generation while being trained on only one task. The central idea is to train an invertible model that learns a one-to-one mapping between the image and text embeddings. Once the invertible model is efficiently trained on one task, the image captioning, the same model can generate new images for a given text through the inversion process, with no additional training. This paper proposes a simple invertible neural network architecture for this problem and presents our current findings.