CVAIJan 1, 2024

DiffMorph: Text-less Image Morphing with Diffusion Models

arXiv:2401.00739v14 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the challenge for artists in intuitively controlling AI image synthesis without relying on multiple images or textual descriptions, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of generating customized images without textual prompts by introducing DiffMorph, which uses artist-drawn sketches to morph an initial image into a new composition, achieving results comparable to prompt-based methods.

Text-conditioned image generation models are a prevalent use of AI image synthesis, yet intuitively controlling output guided by an artist remains challenging. Current methods require multiple images and textual prompts for each object to specify them as concepts to generate a single customized image. On the other hand, our work, \verb|DiffMorph|, introduces a novel approach that synthesizes images that mix concepts without the use of textual prompts. Our work integrates a sketch-to-image module to incorporate user sketches as input. \verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image. We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully. We seamlessly merge images and concepts from sketches into a cohesive composition. The image generation capability of our work is demonstrated through our results and a comparison of these with prompt-based image generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes