CVJul 4, 2023

Technical Report for Ego4D Long Term Action Anticipation Challenge 2023

arXiv:2307.01467v110 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses action prediction in egocentric videos, but it is incremental as it builds on existing methods with specific enhancements.

The authors tackled the Ego4D Long-Term Action Anticipation Challenge by introducing three improvements to a baseline model, including model ensemble, label smoothing, and word co-occurrence constraints, resulting in a second-place finish on the public leaderboard.

In this report, we describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video. To accomplish this task, we introduce three improvements to the baseline model, which consists of an encoder that generates clip-level features from the video, an aggregator that integrates multiple clip-level features, and a decoder that outputs Z future actions. 1) Model ensemble of SlowFast and SlowFast-CLIP; 2) Label smoothing to relax order constraints for future actions; 3) Constraining the prediction of the action class (verb, noun) based on word co-occurrence. Our method outperformed the baseline performance and recorded as second place solution on the public leaderboard.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes