CLAISep 13, 2022

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

arXiv:2209.05840v1583 citationsh-index: 28
Originality Synthesis-oriented
AI Analysis

This dataset addresses the problem of multimodal learning for cooking tasks, but it is incremental as it builds on existing datasets by adding visual state changes and graph-based workflows.

The authors introduced the Visual Recipe Flow dataset to learn cooking action results from recipe texts, consisting of object state changes as image pairs and workflow as recipe flow graphs, enabling applications like multimodal commonsense reasoning and procedural text generation.

We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph (r-FG). The image pairs are grounded in the r-FG, which provides the cross-modal relation. With our dataset, one can try a range of applications, from multimodal commonsense reasoning and procedural text generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes