LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge
This addresses a need for professional artists by providing a tool for hierarchical image control, though it is incremental as it builds on existing generative models.
The paper tackles the problem of synthesizing layered images by generating a composite image with an off-the-shelf model and then disassembling it into foreground and background layers, bypassing the need for large-scale training and enabling diverse content generation.
Layers have become indispensable tools for professional artists, allowing them to build a hierarchical structure that enables independent control over individual visual elements. In this paper, we propose LayeringDiff, a novel pipeline for the synthesis of layered images, which begins by generating a composite image using an off-the-shelf image generative model, followed by disassembling the image into its constituent foreground and background layers. By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training to develop generative capabilities for individual layers. Furthermore, by utilizing a pretrained off-the-shelf generative model, our method can produce diverse contents and object scales in synthesized layers. For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers. We also propose high-frequency alignment modules to refine the fine-details of the estimated layers. Our comprehensive experiments demonstrate that our approach effectively synthesizes layered images and supports various practical applications.