An Evaluation of Action Recognition Models on EPIC-Kitchens
This work provides a baseline evaluation for researchers working on egocentric action recognition, though it is incremental as it applies existing methods to a new dataset.
The authors benchmarked existing action recognition models (TSN, TRN, TSM) on the EPIC-Kitchens dataset, which features egocentric videos of daily activities, to assess performance on challenges like long-tail class distribution and unseen environments, but did not report specific numerical results.
We benchmark contemporary action recognition models (TSN, TRN, and TSM) on the recently introduced EPIC-Kitchens dataset and release pretrained models on GitHub (https://github.com/epic-kitchens/action-models) for others to build upon. In contrast to popular action recognition datasets like Kinetics, Something-Something, UCF101, and HMDB51, EPIC-Kitchens is shot from an egocentric perspective and captures daily actions in-situ. In this report, we aim to understand how well these models can tackle the challenges present in this dataset, such as its long tail class distribution, unseen environment test set, and multiple tasks (verb, noun and, action classification). We discuss the models' shortcomings and avenues for future research.