Learning Compact Reward for Image Captioning
This work addresses reward ambiguity in image captioning, which is an incremental improvement for generating more natural and diverse descriptions.
The paper tackled the problem of vague and ill-defined rewards in adversarial learning for image captioning by proposing a refined Adversarial Inverse Reinforcement Learning method that disentangles rewards per word and stabilizes training, achieving improved performance on MS COCO and Flickr30K datasets.
Adversarial learning has shown its advances in generating natural and diverse descriptions in image captioning. However, the learned reward of existing adversarial methods is vague and ill-defined due to the reward ambiguity problem. In this paper, we propose a refined Adversarial Inverse Reinforcement Learning (rAIRL) method to handle the reward ambiguity problem by disentangling reward for each word in a sentence, as well as achieve stable adversarial training by refining the loss function to shift the generator towards Nash equilibrium. In addition, we introduce a conditional term in the loss function to mitigate mode collapse and to increase the diversity of the generated descriptions. Our experiments on MS COCO and Flickr30K show that our method can learn compact reward for image captioning.