CVJul 16, 2022

SVGraph: Learning Semantic Graphs from Instructional Videos

arXiv:2207.08001v15 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the challenge of video understanding for researchers and practitioners by providing an interpretable method, though it appears incremental as it builds on existing multi-modal and self-supervised techniques.

The paper tackles the problem of generating graphical representations from noisy instructional videos without requiring expensive annotations, proposing SVGraph, a self-supervised, multi-modal approach that uses narrations for semantic interpretability and demonstrates interpretability in semantic graph learning across multiple datasets.

In this work, we focus on generating graphical representations of noisy, instructional videos for video understanding. We propose a self-supervised, interpretable approach that does not require any annotations for graphical representations, which would be expensive and time consuming to collect. We attempt to overcome "black box" learning limitations by presenting Semantic Video Graph or SVGraph, a multi-modal approach that utilizes narrations for semantic interpretability of the learned graphs. SVGraph 1) relies on the agreement between multiple modalities to learn a unified graphical structure with the help of cross-modal attention and 2) assigns semantic interpretation with the help of Semantic-Assignment, which captures the semantics from video narration. We perform experiments on multiple datasets and demonstrate the interpretability of SVGraph in semantic graph learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes