Interactive Machine Learning for Image Captioning
This addresses the need for more efficient use of human feedback in image captioning, but it is incremental as it builds on existing methods without introducing a new paradigm.
The paper tackles the problem of expensive human feedback in training image captioning models by proposing an interactive learning approach that multiplies feedback through data augmentation and integrates it smartly into the model.
We propose an approach for interactive learning for an image captioning model. As human feedback is expensive and modern neural network based approaches often require large amounts of supervised data to be trained, we envision a system that exploits human feedback as good as possible by multiplying the feedback using data augmentation methods, and integrating the resulting training examples into the model in a smart way. This approach has three key components, for which we need to find suitable practical implementations: feedback collection, data augmentation, and model update. We outline our idea and review different possibilities to address these tasks.