CVAILGMMDec 2, 2024

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

arXiv:2412.01824v214 citationsh-index: 32
Originality Incremental advance
AI Analysis

This addresses the challenge of task generalization in image generation for AI researchers, though it is incremental as it builds on existing vision-language models.

The paper tackles the problem of enabling in-context learning for general image generation tasks in auto-regressive vision-language models, introducing X-Prompt, which achieves competitive performance across diverse seen and unseen tasks within a unified framework.

In-context generation is a key component of large language models' (LLMs) open-task generalization capability. By leveraging a few examples as context, LLMs can perform both in-domain and out-of-domain tasks. Recent advancements in auto-regressive vision-language models (VLMs) built upon LLMs have showcased impressive performance in text-to-image generation. However, the potential of in-context learning for general image generation tasks remains largely unexplored. To address this, we introduce X-Prompt, a purely auto-regressive large-vision language model designed to deliver competitive performance across a wide range of both seen and unseen image generation tasks, all within a unified in-context learning framework. X-Prompt incorporates a specialized design that efficiently compresses valuable features from in-context examples, supporting longer in-context token sequences and improving its ability to generalize to unseen tasks. A unified training task for both text and image prediction enables X-Prompt to handle general image generation with enhanced task awareness from in-context examples. Extensive experiments validate the model's performance across diverse seen image generation tasks and its capacity to generalize to previously unseen tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes