CVMar 30, 2022

Learning Program Representations for Food Images and Cooking Recipes

arXiv:2203.16071v143 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating interpretable and manipulable representations for instructional procedures like cooking, benefiting users and agents in AI applications, though it is incremental in applying programmatic structures to a specific domain.

The paper tackles the problem of modeling cooking recipes and food images by representing them as structured cooking programs, which improves cross-modal retrieval, recognition, and image generation tasks, with results showing better performance in these areas.

In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate food images by manipulating programs via optimizing the latent code of a GAN. Code, data, and models are available online.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes