CVSep 9, 2024

Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space

Junho Lee, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee

arXiv:2409.05260v53.71 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the computational bottleneck in video classification for researchers and practitioners, though it is incremental as it builds on existing frame sampling methods.

The paper tackles the problem of efficiently selecting a small subset of frames from a video for classification by reducing the search space from O(T^N) to O(T), using a semi-optimal policy based on per-frame confidence, which achieves stable and high performance across various datasets and model architectures.

Given a video with $T$ frames, frame sampling is a task to select $N \ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $\binom{T}{N}$, especially when $N$ gets large. To address this challenge, we introduce a novel perspective of reducing the search space from $O(T^N)$ to $O(T)$. Instead of exploring the entire $O(T^N)$ space, our proposed semi-optimal policy selects the top $N$ frames based on the independently estimated value of each frame using per-frame confidence, significantly reducing the computational complexity. We verify that our semi-optimal policy can efficiently approximate the optimal policy, particularly under practical settings. Additionally, through extensive experiments on various datasets and model architectures, we demonstrate that learning our semi-optimal policy ensures stable and high performance regardless of the size of $N$ and $T$.

View on arXiv PDF

Similar