CVDec 6, 2023

FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation

arXiv:2312.03540v16 citationsh-index: 7Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for realistic food imagery in applications like image-based dietary assessment, though it appears incremental as it adapts existing methods to a specific domain.

The paper tackles the problem of generating unrealistic food images with existing Latent Diffusion Models by introducing FoodFusion, a model specifically engineered for realistic food image synthesis from text, which shows significant improvements in realism and diversity over public models.

Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images. However, these generated images often exhibit an artistic or surreal quality that diverges from the authenticity of real-world food representations. This inadequacy renders them impractical for applications requiring realistic food imagery, such as training models for image-based dietary assessment. To address these limitations, we introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions. The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs. Additionally, we propose and employ two distinct data cleaning methodologies to ensure that the resulting image-text pairs maintain both realism and accuracy. The FoodFusion model, thus trained, demonstrates a remarkable ability to generate food images that exhibit a significant improvement in terms of both realism and diversity over the publicly available image generation models. We openly share the dataset and fine-tuned models to support advancements in this critical field of food image synthesis at https://bit.ly/genai4good.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes