CVLGMMAug 24, 2023

Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition

arXiv:2308.12673v2h-index: 37
Originality Incremental advance
AI Analysis

This addresses video event recognition for computer vision applications, but appears incremental as it builds on existing architectures.

The paper tackles unsupervised pre-training for video event recognition by introducing Masked Feature Modelling (MFM), which reconstructs masked object features using a Graph Attention Network block; experiments on YLI-MED show improved accuracy.

In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pre-training of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model's starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes