CVJul 9, 2024
Sketch-Guided Scene Image GenerationTianyu Zhang, Xiaoxuan Xie, Xusheng Du et al.
Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image scene generation from sketch inputs into object-level cross-domain generation and scene-level image construction. We employ pre-trained diffusion models to convert each single object drawing into an image of the object, inferring additional details while maintaining the sparse sketch structure. In order to maintain the conceptual fidelity of the foreground during scene generation, we invert the visual features of object images into identity embeddings for scene generation. In scene-level image construction, we generate the latent representation of the scene image using the separated background prompts, and then blend the generated foreground objects according to the layout of the sketch input. To ensure the foreground objects' details remain unchanged while naturally composing the scene image, we infer the scene image on the blended latent representation using a global prompt that includes the trained identity tokens. Through qualitative and quantitative experiments, we demonstrate the ability of the proposed approach to generate scene images from hand-drawn sketches surpasses the state-of-the-art approaches.
CVDec 1, 2024
Sketch-Guided Motion Diffusion for Stylized Cinemagraph SynthesisHao Jin, Hengyuan Chang, Xiaoxuan Xie et al.
Designing stylized cinemagraphs is challenging due to the difficulty in customizing complex and expressive flow motions. To achieve intuitive and detailed control of the generated cinemagraphs, freehand sketches can provide a better solution to convey personalized design requirements than only text inputs. In this paper, we propose Sketch2Cinemagraph, a sketch-guided framework that enables the conditional generation of stylized cinemagraphs from freehand sketches. Sketch2Cinemagraph adopts text prompts for initial content generation and provides hand-drawn sketch controls for both spatial and motion cues. The latent diffusion model is adopted to generate target stylized landscape images along with realistic versions. Then, a pre-trained object detection model is utilized to segment and obtain masks for the flow regions. We proposed a novel latent motion diffusion model to estimate the motion field in the fluid regions of the generated landscape images. The input motion sketches serve as the conditions to control the generated vector fields in the masked fluid regions with the prompt. To synthesize the cinemagraph frames, the pixels within fluid regions are subsequently warped to the target locations for each timestep using a frame generator. The results verified that Sketch2Cinemagraph can generate high-fidelity and aesthetically appealing stylized cinemagraphs with continuous temporal flow from intuitive sketch inputs. We showcase the advantages of Sketch2Cinemagraph through quantitative comparisons against the state-of-the-art generation approaches.