LG DC GRJan 29, 2025

Prompt-Aware Scheduling for Efficient Text-to-Image Inferencing System

Shubham Agarwal, Saud Iqbal, Subrata Mitra

arXiv:2502.06798v17.11 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses performance degradation in text-to-image inference systems for users under high-demand scenarios, representing an incremental improvement in scheduling efficiency.

The paper tackles the inefficiency of traditional accuracy scaling for text-to-image models under high loads by introducing a system that optimally matches prompts to model instances at different approximation levels, achieving high-quality image generation within fixed budgets.

Traditional ML models utilize controlled approximations during high loads, employing faster, but less accurate models in a process called accuracy scaling. However, this method is less effective for generative text-to-image models due to their sensitivity to input prompts and performance degradation caused by large model loading overheads. This work introduces a novel text-to-image inference system that optimally matches prompts across multiple instances of the same model operating at various approximation levels to deliver high-quality images under high loads and fixed budgets.

View on arXiv PDF

Similar