CVMay 10, 2024

Deep video representation learning: a survey

arXiv:2405.06574v17 citationsh-index: 8Multimedia tools and applications
Originality Synthesis-oriented
AI Analysis

It addresses the fundamental problem of building effective video representations for computer vision, but is incremental as a review paper.

This survey reviews and classifies spatiotemporal feature learning methods for video analysis, comparing their pros and cons for general tasks.

This paper provides a review on representation learning for videos. We classify recent spatiotemporal feature learning methods for sequential visual data and compare their pros and cons for general video analysis. Building effective features for videos is a fundamental problem in computer vision tasks involving video analysis and understanding. Existing features can be generally categorized into spatial and temporal features. Their effectiveness under variations of illumination, occlusion, view and background are discussed. Finally, we discuss the remaining challenges in existing deep video representation learning studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes