AICLLGFeb 17, 2023

Unsupervised Task Graph Generation from Instructional Video Transcripts

arXiv:2302.09173v2225 citationsh-index: 18
AI Analysis

This work addresses the challenge of automatically understanding real-world activities from video data, which is incremental as it builds on prior formulations with a new unsupervised method.

The paper tackles the problem of generating task graphs from instructional video transcripts by identifying key steps and their dependencies, and shows that their unsupervised approach produces more accurate task graphs than a supervised method on ProceL and CrossTask datasets.

This work explores the problem of generating task graphs of real-world activities. Different from prior formulations, we consider a setting where text transcripts of instructional videos performing a real-world activity (e.g., making coffee) are provided and the goal is to identify the key steps relevant to the task as well as the dependency relationship between these key steps. We propose a novel task graph generation approach that combines the reasoning capabilities of instruction-tuned language models along with clustering and ranking components to generate accurate task graphs in a completely unsupervised manner. We show that the proposed approach generates more accurate task graphs compared to a supervised learning approach on tasks from the ProceL and CrossTask datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes