AI CV LG MMApr 1, 2019

Constructing Hierarchical Q&A Datasets for Video Story Understanding

Yu-Jung Heo, Kyoung-Woon On, Seongho Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang

arXiv:1904.00623v17.55 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more nuanced benchmarks in video understanding research, though it appears incremental as it builds on existing Q&A dataset approaches.

The paper tackles the problem of biased and low-variance video Q&A datasets by proposing a hierarchical method to construct datasets based on criteria like memory capacity and logical complexity, aiming to better measure intelligence in video story understanding.

Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets for video Q&A tasks, they did not really incorporate story-level understanding, resulting in highly-biased and lack of variance in degree of question difficulty. In this paper, we propose a hierarchical method for building Q&A datasets, i.e. hierarchical difficulty levels. We introduce three criteria for video story understanding, i.e. memory capacity, logical complexity, and DIKW (Data-Information-Knowledge-Wisdom) pyramid. We discuss how three-dimensional map constructed from these criteria can be used as a metric for evaluating the levels of intelligence relating to video story understanding.

View on arXiv PDF

Similar