"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021
This work addresses data-efficient action recognition for computer vision researchers, though it is incremental as it combines existing methods.
The paper tackled action recognition on a small dataset (Kinetics400ViPriors) without extra data, achieving 73% accuracy on the test set and winning first place in the VIPriors21 challenge.
This technical report presents our approach "Knights" to solve the action recognition task on a small subset of Kinetics-400 i.e. Kinetics400ViPriors without using any extra-data. Our approach has 3 main components: state-of-the-art Temporal Contrastive self-supervised pretraining, video transformer models, and optical flow modality. Along with the use of standard test-time augmentation, our proposed solution achieves 73% on Kinetics400ViPriors test set, which is the best among all of the other entries Visual Inductive Priors for Data-Efficient Computer Vision's Action Recognition Challenge, ICCV 2021.