CV AINov 27, 2024

Generative Visual Communication in the Era of Vision-Language Models

arXiv:2411.18727v12.0h-index: 13

Originality Incremental advance

AI Analysis

This work addresses the challenge of automating visual communication for designers and creators, but it appears incremental as it builds on existing VLM capabilities with specific constraints.

The dissertation tackles the problem of automating effective visual communication design using vision-language models, addressing their limitations in simplifying complex ideas into clear abstract visuals and pixel-based outputs by constraining operational space and introducing task-specific regularizations.

Visual communication, dating back to prehistoric cave paintings, is the use of visual elements to convey ideas and information. In today's visually saturated world, effective design demands an understanding of graphic design principles, visual storytelling, human psychology, and the ability to distill complex information into clear visuals. This dissertation explores how recent advancements in vision-language models (VLMs) can be leveraged to automate the creation of effective visual communication designs. Although generative models have made great progress in generating images from text, they still struggle to simplify complex ideas into clear, abstract visuals and are constrained by pixel-based outputs, which lack flexibility for many design tasks. To address these challenges, we constrain the models' operational space and introduce task-specific regularizations. We explore various aspects of visual communication, namely, sketches and visual abstraction, typography, animation, and visual inspiration.

View on arXiv PDF

Similar