CVAILGSep 20, 2024

First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge

arXiv:2409.13538v1h-index: 6
Originality Synthesis-oriented
AI Analysis

This work addresses video understanding for AI competitions, but it is incremental as it applies existing methods to a specific challenge.

The authors tackled the Multiple-choice Video QA track of The Second Perception Test Challenge by fine-tuning QwenVL2 (7B) with ensemble strategies and Test Time Augmentation, achieving a Top-1 Accuracy of 0.7647 and securing first place.

In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge. This competition posed a complex video understanding task, requiring models to accurately comprehend and answer questions about video content. To address this challenge, we leveraged the powerful QwenVL2 (7B) model and fine-tune it on the provided training set. Additionally, we employed model ensemble strategies and Test Time Augmentation to boost performance. Through continuous optimization, our approach achieved a Top-1 Accuracy of 0.7647 on the leaderboard.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes