Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model
This work addresses the need for better controllability in architectural design generation, but it is incremental as it builds on existing Stable Diffusion and ControlNet models.
The authors tackled the problem of low controllability in architectural image generation by proposing a multi-network method combining fine-tuned Stable Diffusion with ControlNet, resulting in significantly reduced fine-tuning costs and improved controllability for building facade images.
Stable Diffusion model has been extensively employed in the study of archi-tectural image generation, but there is still an opportunity to enhance in terms of the controllability of the generated image content. A multi-network combined text-to-building facade image generating method is proposed in this work. We first fine-tuned the Stable Diffusion model on the CMP Fa-cades dataset using the LoRA (Low-Rank Adaptation) approach, then we ap-ply the ControlNet model to further control the output. Finally, we contrast-ed the facade generating outcomes under various architectural style text con-tents and control strategies. The results demonstrate that the LoRA training approach significantly decreases the possibility of fine-tuning the Stable Dif-fusion large model, and the addition of the ControlNet model increases the controllability of the creation of text to building facade images. This pro-vides a foundation for subsequent studies on the generation of architectural images.