Hierarchically-Attentive RNN for Album Summarization and Storytelling
This work addresses visual storytelling for photo albums, which is an incremental improvement in a specific domain.
The paper tackles the problem of end-to-end visual storytelling by selecting representative photos from an album and generating a natural language story, achieving better performance than baselines in automatic and human evaluations.
We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.