PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes
This work addresses the challenge of commonsense reasoning in AI for tasks like following cooking instructions, but it is incremental as it focuses on a specific domain with a new dataset.
The paper tackles the problem of enabling machines to understand procedural texts like cooking recipes by modeling commonsense reasoning about intermediate steps, and it introduces a new benchmark where GPT-4 achieves only 26% human-evaluated preference, indicating significant room for improvement.
Understanding procedural texts, such as cooking recipes, is essential for enabling machines to follow instructions and reason about tasks, a key aspect of intelligent reasoning. In cooking, these instructions can be interpreted as a series of modifications to a food preparation. For a model to effectively reason about cooking recipes, it must accurately discern and understand the inputs and outputs of intermediate steps within the recipe. We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step. PizzaCommonsense serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit input-output descriptions to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-4 achieves only 26\% human-evaluated preference for generations, leaving room for future improvements.