CV AIJun 29, 2022

Technical Report for CVPR 2022 LOVEU AQTC Challenge

Hyeonyu Kim, Jongeun Kim, Jeonghun Kang, Sanguk Park, Dongchan Park, Taehwan Kim

arXiv:2206.14555v11.4h-index: 3Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a specific video understanding challenge for researchers in computer vision, but it is incremental as it builds on existing tasks and methods.

The paper tackled the AQTC task in the CVPR 2022 LOVEU challenge, which involves multi-step answers and multi-modal video data with diverse button representations, by proposing a new context ground module attention mechanism for better feature mapping, achieving 2nd place overall and 1st place in two out of four evaluation metrics.

This technical report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges. This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video. We address this problem by proposing a new context ground module attention mechanism for more effective feature mapping. In addition, we also perform the analysis over the number of buttons and ablation study of different step networks and video features. As a result, we achieved the overall 2nd place in LOVEU competition track 3, specifically the 1st place in two out of four evaluation metrics. Our code is available at https://github.com/jaykim9870/ CVPR-22_LOVEU_unipyler.

View on arXiv PDF Code

Similar