MMSep 4, 2021

What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID

Aozhu Chen, Fan Hu, Zihan Wang, Fangming Zhou, Xirong Li

arXiv:2109.01774v25.97 citationsh-index: 41

Originality Synthesis-oriented

AI Analysis

This work addresses the need for component-wise analysis in AVS evaluations, providing insights for researchers and practitioners to develop better solutions, though it is incremental as it systematically compares existing components rather than introducing new methods.

The paper tackles the problem of understanding which components influence performance in Ad-hoc Video Search (AVS) by conducting a large-scale evaluation on TRECVID datasets from 2016-2020, constructing 25 solutions with different combinations of models, features, and training data to reveal key factors.

For quantifying progress in Ad-hoc Video Search (AVS), the annual TRECVID AVS task is an important international evaluation. Solutions submitted by the task participants vary in terms of their choices of cross-modal matching models, visual features and training data. As such, what one may conclude from the evaluation is at a high level that is insufficient to reveal the influence of the individual components. In order to bridge the gap between the current solution-level comparison and the desired component-wise comparison, we propose in this paper a large-scale and systematic evaluation on TRECVID. By selected combinations of state-of-the-art matching models, visual features and (pre-)training data, we construct a set of 25 different solutions and evaluate them on the TRECVID AVS tasks 2016--2020. The presented evaluation helps answer the key question of what matters for AVS. The resultant observations and learned lessons are also instructive for developing novel AVS solutions.

View on arXiv PDF

Similar