CLJan 17, 2021

Narration Generation for Cartoon Videos

arXiv:2101.06803v11 citations
Originality Incremental advance
AI Analysis

This work addresses a novel task for enhancing video storytelling, but it is incremental as it builds on existing multimodal text generation methods.

The paper tackles the problem of generating narration texts for cartoon videos, a task not previously addressed, and presents models for timing and content generation using a new dataset from Peppa Pig.

Research on text generation from multimodal inputs has largely focused on static images, and less on video data. In this paper, we propose a new task, narration generation, that is complementing videos with narration texts that are to be interjected in several places. The narrations are part of the video and contribute to the storyline unfolding in it. Moreover, they are context-informed, since they include information appropriate for the timeframe of video they cover, and also, do not need to include every detail shown in input scenes, as a caption would. We collect a new dataset from the animated television series Peppa Pig. Furthermore, we formalize the task of narration generation as including two separate tasks, timing and content generation, and present a set of models on the new task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes