CVMar 21, 2024

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

arXiv:2403.14430v12 citationsh-index: 20CVPR
Originality Incremental advance
AI Analysis

This work addresses the insufficient labeling problem in video question answering, which is a domain-specific challenge for researchers and practitioners in video understanding, but it is incremental as it builds on existing distillation techniques.

The paper tackles the problem of insufficient labels in open-ended video question answering by proposing a ranking distillation framework (RADI) that uses a teacher model to generate rankings for potential answers, enriching labeling information without extra annotation. Experiments on five benchmarks show that RADI outperforms state-of-the-art methods, with consistent improvements in performance.

This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question. As a result, existing works tend to directly treat all the unlabeled answers as negative labels, leading to limited ability for generalization. In this work, we introduce a simple yet effective ranking distillation framework (RADI) to mitigate this problem without additional manual annotation. RADI employs a teacher model trained with incomplete labels to generate rankings for potential answers, which contain rich knowledge about label priority as well as label-associated visual cues, thereby enriching the insufficient labeling information. To avoid overconfidence in the imperfect teacher model, we further present two robust and parameter-free ranking distillation approaches: a pairwise approach which introduces adaptive soft margins to dynamically refine the optimization constraints on various pairwise rankings, and a listwise approach which adopts sampling-based partial listwise learning to resist the bias in teacher ranking. Extensive experiments on five popular benchmarks consistently show that both our pairwise and listwise RADIs outperform state-of-the-art methods. Further analysis demonstrates the effectiveness of our methods on the insufficient labeling problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes