CVDec 3, 2019

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

arXiv:1912.01601v1114 citations
Originality Incremental advance
AI Analysis

This work addresses resource efficiency for video recognition systems, suitable for both online and offline scenarios, with incremental improvements in dynamic computation allocation.

The paper tackles the problem of reducing computational cost in video recognition by proposing LiteEval, a coarse-to-fine framework that dynamically allocates computation based on frame complexity, achieving excellent classification accuracy with substantially less computation on FCVID and ActivityNet benchmarks.

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios. Exploiting decent yet computationally efficient features derived at a coarse scale with a lightweight CNN model, LiteEval dynamically decides on-the-fly whether to compute more powerful features for incoming video frames at a finer scale to obtain more details. This is achieved by a coarse LSTM and a fine LSTM operating cooperatively, as well as a conditional gating module to learn when to allocate more computation. Extensive experiments are conducted on two large-scale video benchmarks, FCVID and ActivityNet, and the results demonstrate LiteEval requires substantially less computation while offering excellent classification accuracy for both online and offline predictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes