CV CL LGMar 26, 2023

GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Ji Qi, Jifan Yu, Teng Tu, Kunyu Gao, Yifan Xu, Xinyu Guan, Xiaozhi Wang, Yuxiao Dong, Bin Xu, Lei Hou, Juanzi Li, Jie Tang

Peking UTsinghua

arXiv:2303.14655v217.539 citationsh-index: 47Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of generating detailed, knowledge-based video descriptions for applications like automatic sports commentary, but it is incremental as it adapts existing methods to a new benchmark.

The authors introduced GOAL, a benchmark for knowledge-grounded video captioning with over 8.9k soccer video clips and 42k knowledge triples, and adapted existing methods to demonstrate the task's difficulty and potential directions.

Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task. Our data and code are available at https://github.com/THU-KEG/goal.

View on arXiv PDF Code

Similar