Scene Designer: a Unified Model for Scene Search and Synthesis from Sketch
This addresses the need for efficient scene retrieval and generation tools for users in creative or design fields, though it appears incremental as it builds on existing methods like GNNs and Transformers.
The paper tackles the problem of searching and generating images from free-hand sketches of scene compositions, proposing Scene Designer, a unified model that achieves state-of-the-art sketch-based visual search and synthesizes coherent scene layouts.
Scene Designer is a novel method for searching and generating images using free-hand sketches of scene compositions; i.e. drawings that describe both the appearance and relative positions of objects. Our core contribution is a single unified model to learn both a cross-modal search embedding for matching sketched compositions to images, and an object embedding for layout synthesis. We show that a graph neural network (GNN) followed by Transformer under our novel contrastive learning setting is required to allow learning correlations between object type, appearance and arrangement, driving a mask generation module that synthesises coherent scene layouts, whilst also delivering state of the art sketch based visual search of scenes.