Toward a Human-Level Video Understanding Intelligence
This addresses the lack of evaluation methods for video understanding AI, which is crucial for researchers and developers in AI and computer vision.
The authors tackled the challenge of evaluating AI agents' video understanding by proposing the Video Turing Test, a method for assessing both intelligence and human-likeness, and demonstrated its effectiveness through a case study.
We aim to develop an AI agent that can watch video clips and have a conversation with human about the video story. Developing video understanding intelligence is a significantly challenging task, and evaluation methods for adequately measuring and analyzing the progress of AI agent are lacking as well. In this paper, we propose the Video Turing Test to provide effective and practical assessments of video understanding intelligence as well as human-likeness evaluation of AI agents. We define a general format and procedure of the Video Turing Test and present a case study to confirm the effectiveness and usefulness of the proposed test.