CVAIMMApr 14, 2025

Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes

arXiv:2504.09948v33 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses the need for culturally distinctive dish images in the digitized food industry and e-commerce, but it is incremental as it adapts existing methods to a specific domain.

The authors tackled the problem of generating photorealistic and faithful images of Chinese dishes, which existing text-to-image models struggle with, by proposing Omni-Dish, a specialized model that achieves superior results through a comprehensive dataset and training scheme.

Dish images play a crucial role in the digital era, with the demand for culturally distinctive dish images continuously increasing due to the digitization of the food industry and e-commerce. In general cases, existing text-to-image generation models excel in producing high-quality images; however, they struggle to capture diverse characteristics and faithful details of specific domains, particularly Chinese dishes. To address this limitation, we propose Omni-Dish, the first text-to-image generation model specifically tailored for Chinese dishes. We develop a comprehensive dish curation pipeline, building the largest dish dataset to date. Additionally, we introduce a recaption strategy and employ a coarse-to-fine training scheme to help the model better learn fine-grained culinary nuances. During inference, we enhance the user's textual input using a pre-constructed high-quality caption library and a large language model, enabling more photorealistic and faithful image generation. Furthermore, to extend our model's capability for dish editing tasks, we propose Concept-Enhanced P2P. Based on this approach, we build a dish editing dataset and train a specialized editing model. Extensive experiments demonstrate the superiority of our methods.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes