CVSep 27, 2022

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

Nisha Huang, Fan Tang, Weiming Dong, Changsheng Xu

arXiv:2209.13360v218.859 citationsh-index: 30Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more diverse and expressive digital art generation for the multimedia community, representing an incremental improvement by combining existing diffusion models with multimodal guidance.

The paper tackles the problem of limited expressiveness and diversity in digital art synthesis by proposing a multimodal guided artwork diffusion (MGAD) model that uses text and image prompts to control a diffusion model, with extensive experiments confirming its effectiveness in generating digital art paintings.

Digital art synthesis is receiving increasing attention in the multimedia community because of engaging the public with art effectively. Current digital art synthesis methods usually use single-modality inputs as guidance, thereby limiting the expressiveness of the model and the diversity of generated results. To solve this problem, we propose the multimodal guided artwork diffusion (MGAD) model, which is a diffusion-based digital artwork generation approach that utilizes multimodal prompts as guidance to control the classifier-free diffusion model. Additionally, the contrastive language-image pretraining (CLIP) model is used to unify text and image modalities. Extensive experimental results on the quality and quantity of the generated digital art paintings confirm the effectiveness of the combination of the diffusion model and multimodal guidance. Code is available at https://github.com/haha-lisa/MGAD-multimodal-guided-artwork-diffusion.

View on arXiv PDF Code

Similar