Image Captioning
It addresses the problem of generating descriptive captions from images for applications like video annotation, but appears incremental as it builds on existing methods without claiming major breakthroughs.
The paper tackles image captioning by experimenting on labeled datasets to evaluate model accuracy and language fluency, and applies it to video captions while discussing challenges.
This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the interdependence between the objects/concepts in the image and the creation of a succinct sentential narration. Experiments on several labeled datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. As a toy application, we apply image captioning to create video captions, and we advance a few hypotheses on the challenges we encountered.