CV AIJul 20, 2024

GreenStableYolo: Optimizing Inference Time and Image Quality of Text-to-Image Generation

Jingzhi Gong, Sisi Li, Giordano d'Aloisio, Zishuo Ding, Yulong Ye, William B. Langdon, Federica Sarro

arXiv:2407.14982v17.67 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses efficiency and quality optimization in text-to-image generation, though it appears incremental as it builds on existing methods like Stable Diffusion and Yolo.

The paper tackles the challenge of tuning parameters and prompts for text-to-image generation by introducing GreenStableYolo, which reduces GPU inference time by 266% and increases hypervolume by 526% compared to baselines, with only an 18% trade-off in image quality.

Tuning the parameters and prompts for improving AI-based text-to-image generation has remained a substantial yet unaddressed challenge. Hence we introduce GreenStableYolo, which improves the parameters and prompts for Stable Diffusion to both reduce GPU inference time and increase image generation quality using NSGA-II and Yolo. Our experiments show that despite a relatively slight trade-off (18%) in image quality compared to StableYolo (which only considers image quality), GreenStableYolo achieves a substantial reduction in inference time (266% less) and a 526% higher hypervolume, thereby advancing the state-of-the-art for text-to-image generation.

View on arXiv PDF Code

Similar