Design a Delicious Lunchbox in Style
This work addresses a domain-specific problem for designers or AI applications in food presentation, and it is incremental as it builds on previous research.
The paper tackles the problem of text-to-image synthesis for designing lunchbox scenes with multiple occluded objects, resulting in the creation of the Bento800 dataset and models for layout prediction and image composition.
We propose a cyclic generative adversarial network with spatial-wise and channel-wise attention modules for text-to-image synthesis. To accurately depict and design scenes with multiple occluded objects, we design a pre-trained ordering recovery model and a generative adversarial network to predict layout and composite novel box lunch presentations. In the experiments, we devise the Bento800 dataset to evaluate the performance of the text-to-image synthesis model and the layout generation & image composition model. This paper is the continuation of our previous paper works. We also present additional experiments and qualitative performance comparisons to verify the effectiveness of our proposed method. Bento800 dataset is available at https://github.com/Yutong-Zhou-cv/Bento800_Dataset