CVAIJun 29, 2022

Technical Report for CVPR 2022 LOVEU AQTC Challenge

arXiv:2206.14555v1h-index: 3Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses a specific video understanding challenge for researchers in computer vision, but it is incremental as it builds on existing tasks and methods.

The paper tackled the AQTC task in the CVPR 2022 LOVEU challenge, which involves multi-step answers and multi-modal video data with diverse button representations, by proposing a new context ground module attention mechanism for better feature mapping, achieving 2nd place overall and 1st place in two out of four evaluation metrics.

This technical report presents the 2nd winning model for AQTC, a task newly introduced in CVPR 2022 LOng-form VidEo Understanding (LOVEU) challenges. This challenge faces difficulties with multi-step answers, multi-modal, and diverse and changing button representations in video. We address this problem by proposing a new context ground module attention mechanism for more effective feature mapping. In addition, we also perform the analysis over the number of buttons and ablation study of different step networks and video features. As a result, we achieved the overall 2nd place in LOVEU competition track 3, specifically the 1st place in two out of four evaluation metrics. Our code is available at https://github.com/jaykim9870/ CVPR-22_LOVEU_unipyler.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes