CVDec 17, 2021

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

arXiv:2112.09690v263 citations
Originality Incremental advance
AI Analysis

This work addresses the high annotation cost in action recognition for video analysis, offering a novel pseudo-labeling method that is incremental over existing approaches.

The paper tackles the problem of semi-supervised action recognition by proposing Cross-Model Pseudo-Labeling (CMPL), which uses two models to generate pseudo-labels for each other, resulting in improvements such as 17.6% and 25.1% Top-1 accuracy on Kinetics-400 and UCF-101 with 1% labeled data, outperforming FixMatch by 9.0% and 10.3%.

Semi-supervised action recognition is a challenging but important task due to the high cost of data annotation. A common approach to this problem is to assign unlabeled data with pseudo-labels, which are then used as additional supervision in training. Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself. In this work, we propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL). Concretely, we introduce a lightweight auxiliary network in addition to the primary backbone, and ask them to predict pseudo-labels for each other. We observe that, due to their different structural biases, these two models tend to learn complementary representations from the same video clips. Each model can thus benefit from its counterpart by utilizing cross-model predictions as supervision. Experiments on different data partition protocols demonstrate the significant improvement of our framework over existing alternatives. For example, CMPL achieves $17.6\%$ and $25.1\%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1\%$ labeled data, outperforming our baseline model, FixMatch, by $9.0\%$ and $10.3\%$, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes