AIIRLGDec 5, 2021

Gaudí: Conversational Interactions with Deep Representations to Generate Image Collections

arXiv:2112.04404v13 citations
Originality Incremental advance
AI Analysis

This addresses a specific need for designers in early-stage creative processes, offering a novel conversational approach but is incremental as it builds on existing models like GPT-3 and CLIP.

The paper tackles the problem of creating mood-boards for designers by developing Gaudí, a conversational AI system that generates image collections from natural language, using GPT-3 and CLIP to transform sequential image searches into interactive conversations.

Based on recent advances in realistic language modeling (GPT-3) and cross-modal representations (CLIP), Gaudí was developed to help designers search for inspirational images using natural language. In the early stages of the design process, with the goal of eliciting a client's preferred creative direction, designers will typically create thematic collections of inspirational images called "mood-boards". Creating a mood-board involves sequential image searches which are currently performed using keywords or images. Gaudí transforms this process into a conversation where the user is gradually detailing the mood-board's theme. This representation allows our AI to generate new search queries from scratch, straight from a project briefing, following a theme hypothesized by GPT-3. Compared to previous computational approaches to mood-board creation, to the best of our knowledge, ours is the first attempt to represent mood-boards as the stories that designers tell when presenting a creative direction to a client.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes