Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
This dataset addresses the need for paired first and third-person video data for researchers in computer vision and AI, enabling tasks like classification, localization, and captioning, though it is incremental as it builds on existing Charades methodology.
The authors introduced Charades-Ego, a large-scale dataset linking first and third-person videos, containing 68,536 activity instances in 68.8 hours of egocentric video and additional third-person data, making it one of the most diverse datasets for egocentric video tasks.
In Actor and Observer we introduced a dataset linking the first and third-person video understanding domains, the Charades-Ego Dataset. In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68,536 activity instances in 68.8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available. Charades-Ego furthermore shares activity classes, scripts, and methodology with the Charades dataset, that consist of additional 82.3 hours of third-person video with 66,500 activity instances. Charades-Ego has temporal annotations and textual descriptions, making it suitable for egocentric video classification, localization, captioning, and new tasks utilizing the cross-modal nature of the data.