Contextualize, Show and Tell: A Neural Visual Storyteller
This addresses the problem of automated visual storytelling for applications like assistive technology or entertainment, but it is incremental as it builds directly on prior work.
The paper tackles generating short stories from image sequences by extending an existing image description model with an encoder LSTM for context and multiple decoder LSTMs for story portions, achieving competitive results on the METEOR metric and human ratings in the Visual Storytelling Challenge 2018.
We present a neural model for generating short stories from image sequences, which extends the image description model by Vinyals et al. (Vinyals et al., 2015). This extension relies on an encoder LSTM to compute a context vector of each story from the image sequence. This context vector is used as the first state of multiple independent decoder LSTMs, each of which generates the portion of the story corresponding to each image in the sequence by taking the image embedding as the first input. Our model showed competitive results with the METEOR metric and human ratings in the internal track of the Visual Storytelling Challenge 2018.