TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021
This work addresses action anticipation for video understanding, but it is incremental as it builds on existing Transformer-based methods for a specific challenge.
The paper tackled action anticipation in videos by developing a hierarchical attention model, achieving a Mean Top-5 Recall of 13.39% overall and ranking 1st in verb class across all subsets.
In this report, the technical details of our submission to the EPIC-Kitchens Action Anticipation Challenge 2021 are given. We developed a hierarchical attention model for action anticipation, which leverages Transformer-based attention mechanism to aggregate features across temporal dimension, modalities, symbiotic branches respectively. In terms of Mean Top-5 Recall of action, our submission with team name ICL-SJTU achieved 13.39% for overall testing set, 10.05% for unseen subsets and 11.88% for tailed subsets. Additionally, it is noteworthy that our submission ranked 1st in terms of verb class in all three (sub)sets.