CVJan 9, 2023

An Impartial Transformer for Story Visualization

arXiv:2301.03563v14 citationsh-index: 29
Originality Incremental advance
AI Analysis

This addresses the advanced computer vision task of sequential image synthesis for applications like storytelling or video generation, representing an incremental improvement over existing methods.

The paper tackles the problem of story visualization by proposing an Impartial Transformer that generates realistic, text-relevant, and sequentially consistent images with minimal trainable parameters, achieving improved evaluation metrics even for challenging samples with occluded objects.

Story Visualization is an advanced task of computed vision that targets sequential image synthesis, where the generated samples need to be realistic, faithful to their conditioning and sequentially consistent. Our work proposes a novel architectural and training approach: the Impartial Transformer achieves both text-relevant plausible scenes and sequential consistency utilizing as few trainable parameters as possible. This enhancement is even able to handle synthesis of 'hard' samples with occluded objects, achieving improved evaluation metrics comparing to past approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes