CVSep 9, 2024

Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space

arXiv:2409.05260v51 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the computational bottleneck in video classification for researchers and practitioners, though it is incremental as it builds on existing frame sampling methods.

The paper tackles the problem of efficiently selecting a small subset of frames from a video for classification by reducing the search space from O(T^N) to O(T), using a semi-optimal policy based on per-frame confidence, which achieves stable and high performance across various datasets and model architectures.

Given a video with $T$ frames, frame sampling is a task to select $N \ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $\binom{T}{N}$, especially when $N$ gets large. To address this challenge, we introduce a novel perspective of reducing the search space from $O(T^N)$ to $O(T)$. Instead of exploring the entire $O(T^N)$ space, our proposed semi-optimal policy selects the top $N$ frames based on the independently estimated value of each frame using per-frame confidence, significantly reducing the computational complexity. We verify that our semi-optimal policy can efficiently approximate the optimal policy, particularly under practical settings. Additionally, through extensive experiments on various datasets and model architectures, we demonstrate that learning our semi-optimal policy ensures stable and high performance regardless of the size of $N$ and $T$.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes