CVMar 25

CAKE: Real-time Action Detection via Motion Distillation and Background-aware Contrastive Learning

arXiv:2603.2398817.1h-index: 2
AI Analysis

This work addresses real-time action detection for resource-constrained systems, representing an incremental improvement by enhancing efficiency and performance over existing methods.

The paper tackled the challenges of high computational cost and insufficient modeling of discriminative temporal dynamics in Online Action Detection by proposing CAKE, a framework that uses motion distillation and background-aware contrastive learning to achieve a standout mAP compared to state-of-the-art methods while operating at over 72 FPS on a single CPU.

Online Action Detection (OAD) systems face two primary challenges: high computational cost and insufficient modeling of discriminative temporal dynamics against background motion. Adding optical flow could provides strong motion cues but it incurs significant computational overhead. We propose CAKE, a OAD Flow-based distillation framework to transfer motion knowledge into RGB models. We propose Dynamic Motion Adapter (DMA) to suppress static background noise and emphasize pixel changes, effectively approximating optical flow without explicit computation. The framework also integrates a Floating Contrastive Learning strategy to distinguish informative motion dynamics from temporal background. Various experiments conducted on the TVSeries, THUMOS'14, Kinetics-400 datasets show effectiveness of our model. CAKE achieves a standout mAP compared with SOTA while using the same backbone. Our model operates at over 72 FPS on a single CPU, making it highly suitable for resource-constrained systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes