Fine-grained Activities of People Worldwide
This addresses the problem of limited data diversity for fine-grained human activity recognition, enabling more ethical AI insights into daily life patterns worldwide, though it is incremental as it focuses on data collection rather than novel methods.
The paper tackles the lack of diverse fine-grained activity recognition datasets by introducing the CAP dataset, which contains 1.45M video clips of 512 activities from 33 countries, and provides benchmarks showing baseline results for classification and detection.
Every day, humans perform many closely related activities that involve subtle discriminative motions, such as putting on a shirt vs. putting on a jacket, or shaking hands vs. giving a high five. Activity recognition by ethical visual AI could provide insights into our patterns of daily life, however existing activity recognition datasets do not capture the massive diversity of these human activities around the world. To address this limitation, we introduce Collector, a free mobile app to record video while simultaneously annotating objects and activities of consented subjects. This new data collection platform was used to curate the Consented Activities of People (CAP) dataset, the first large-scale, fine-grained activity dataset of people worldwide. The CAP dataset contains 1.45M video clips of 512 fine grained activity labels of daily life, collected by 780 subjects in 33 countries. We provide activity classification and activity detection benchmarks for this dataset, and analyze baseline results to gain insight into how people around with world perform common activities. The dataset, benchmarks, evaluation tools, public leaderboards and mobile apps are available for use at visym.github.io/cap.