CVNov 25, 2025

Smooth regularization for efficient video recognition

arXiv:2511.20928v2Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient video recognition for resource-constrained applications, offering incremental improvements over existing lightweight models.

The paper tackles the problem of improving video recognition accuracy for lightweight models by proposing a smooth regularization technique that instills a temporal inductive bias, resulting in accuracy improvements of 3.8% to 6.4% on Kinetics-600 and setting new state-of-the-art results within FLOP constraints.

We propose a smooth regularization technique that instills a strong temporal inductive bias in video recognition models, particularly benefiting lightweight architectures. Our method encourages smoothness in the intermediate-layer embeddings of consecutive frames by modeling their changes as a Gaussian Random Walk (GRW). This penalizes abrupt representational shifts, thereby promoting low-acceleration solutions that better align with the natural temporal coherence inherent in videos. By leveraging this enforced smoothness, lightweight models can more effectively capture complex temporal dynamics. Applied to such models, our technique yields a 3.8% to 6.4% accuracy improvement on Kinetics-600. Notably, the MoViNets model family trained with our smooth regularization improves the current state of the art by 3.8% to 6.1% within their respective FLOP constraints, while MobileNetV3 and the MoViNets-Stream family achieve gains of 4.9% to 6.4% over prior state-of-the-art models with comparable memory footprints. Our code and models are available at https://github.com/cmusatyalab/grw-smoothing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes