CVMar 21, 2024

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

arXiv:2403.14430v13.72 citationsh-index: 20CVPR

Originality Incremental advance

AI Analysis

This work addresses the insufficient labeling problem in video question answering, which is a domain-specific challenge for researchers and practitioners in video understanding, but it is incremental as it builds on existing distillation techniques.

The paper tackles the problem of insufficient labels in open-ended video question answering by proposing a ranking distillation framework (RADI) that uses a teacher model to generate rankings for potential answers, enriching labeling information without extra annotation. Experiments on five benchmarks show that RADI outperforms state-of-the-art methods, with consistent improvements in performance.

This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question. As a result, existing works tend to directly treat all the unlabeled answers as negative labels, leading to limited ability for generalization. In this work, we introduce a simple yet effective ranking distillation framework (RADI) to mitigate this problem without additional manual annotation. RADI employs a teacher model trained with incomplete labels to generate rankings for potential answers, which contain rich knowledge about label priority as well as label-associated visual cues, thereby enriching the insufficient labeling information. To avoid overconfidence in the imperfect teacher model, we further present two robust and parameter-free ranking distillation approaches: a pairwise approach which introduces adaptive soft margins to dynamically refine the optimization constraints on various pairwise rankings, and a listwise approach which adopts sampling-based partial listwise learning to resist the bias in teacher ranking. Extensive experiments on five popular benchmarks consistently show that both our pairwise and listwise RADIs outperform state-of-the-art methods. Further analysis demonstrates the effectiveness of our methods on the insufficient labeling problem.

View on arXiv PDF

Similar