Insights on the V3C2 Dataset
It addresses the need for common datasets in video research by offering a detailed analysis of V3C2, though it is incremental as it focuses on a specific shard of an existing dataset.
The paper analyzes the V3C2 dataset, a shard of the Vimeo Creative Commons Collection containing roughly 3800 hours of video, to provide insights and simplify its use for research areas like video retrieval.
For research results to be comparable, it is important to have common datasets for experimentation and evaluation. The size of such datasets, however, can be an obstacle to their use. The Vimeo Creative Commons Collection (V3C) is a video dataset designed to be representative of video content found on the web, containing roughly 3800 hours of video in total, split into three shards. In this paper, we present insights on the second of these shards (V3C2) and discuss their implications for research areas, such as video retrieval, for which the dataset might be particularly useful. We also provide all the extracted data in order to simplify the use of the dataset.