CVDec 24, 2024

Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

arXiv:2412.18386v31 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses the challenge of creating engaging how-to videos by automating view selection, which is incremental as it builds on existing multi-view video analysis methods.

The paper tackles the problem of automatically selecting the optimal viewpoint (egocentric or exocentric) at each timepoint in how-to videos, using a model trained on unlabeled human-edited videos, and demonstrates its effectiveness on real-world datasets like HowTo100M and Ego-Exo4D.

We introduce SWITCH-A-VIEW, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled -- but human-edited -- video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between the visual and spoken content in a how-to video on the one hand and its view-switch moments on the other hand. Armed with this predictor, our model can be applied to new multi-view video settings for orchestrating which viewpoint should be displayed when, even when such settings come with limited labels. We demonstrate our idea on a variety of real-world videos from HowTo100M and Ego-Exo4D, and rigorously validate its advantages. Project: https://vision.cs.utexas.edu/projects/switch_a_view/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes