CLCVSep 4, 2018

RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

arXiv:1809.00812v11145 citations
Originality Synthesis-oriented
AI Analysis

This provides a benchmark for evaluating multimodal comprehension systems in the domain of cooking recipes, though it is incremental as it builds on existing dataset efforts.

The authors introduced RecipeQA, a dataset of about 20K cooking recipes with multimodal elements like text and images, to tackle the problem of machine comprehension of procedural knowledge, resulting in over 36K question-answer pairs for evaluation.

Understanding and reasoning about cooking recipes is a fruitful research direction towards enabling machines to interpret procedural text. In this work, we introduce RecipeQA, a dataset for multimodal comprehension of cooking recipes. It comprises of approximately 20K instructional recipes with multiple modalities such as titles, descriptions and aligned set of images. With over 36K automatically generated question-answer pairs, we design a set of comprehension and reasoning tasks that require joint understanding of images and text, capturing the temporal flow of events and making sense of procedural knowledge. Our preliminary results indicate that RecipeQA will serve as a challenging test bed and an ideal benchmark for evaluating machine comprehension systems. The data and leaderboard are available at http://hucvl.github.io/recipeqa.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes