CVSep 16, 2021

Overview of Tencent Multi-modal Ads Video Understanding Challenge

Zhenzhi Wang, Liyu Wu, Zhimin Li, Jiangfeng Xiong, Qinglin Lu

arXiv:2109.07951v12.65 citations

Originality Synthesis-oriented

AI Analysis

This challenge addresses the problem of comprehensive ads video understanding for researchers and industry, though it is incremental as it builds on existing video analysis methods by focusing on ads-specific features.

The paper introduces the first grand challenge for multi-modal ads video understanding, which includes tasks for temporal video structuring and multi-modal classification to predict scene boundaries and categories, aiming to advance ads video applications like recommendation.

Multi-modal Ads Video Understanding Challenge is the first grand challenge aiming to comprehensively understand ads videos. Our challenge includes two tasks: video structuring in the temporal dimension and multi-modal video classification. It asks the participants to accurately predict both the scene boundaries and the multi-label categories of each scene based on a fine-grained and ads-related category hierarchy. Therefore, our task has four distinguishing features from previous ones: ads domain, multi-modal information, temporal segmentation, and multi-label classification. It will advance the foundation of ads video understanding and have a significant impact on many ads applications like video recommendation. This paper presents an overview of our challenge, including the background of ads videos, an elaborate description of task and dataset, evaluation protocol, and our proposed baseline. By ablating the key components of our baseline, we would like to reveal the main challenges of this task and provide useful guidance for future research of this area. In this paper, we give an extended version of our challenge overview. The dataset will be publicly available at https://algo.qq.com/.

View on arXiv PDF

Similar