CVAug 29, 2025

What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos

arXiv:2508.21770v2h-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of improving generalization in open-world video learning for AI systems, though it is incremental as it builds on existing representation learning methods with new data.

The paper tackles the problem of open-world visual representation learning by investigating the impact of training on atypical videos, such as sci-fi and animation, and finds that this approach consistently improves performance across out-of-distribution detection, novel category discovery, and zero-shot action recognition tasks, with specific gains like increased categorical diversity boosting OOD detection.

Humans usually show exceptional generalisation and discovery ability in the open world, when being shown uncommon new concepts. Whereas most existing studies in the literature focus on common typical data from closed sets, open-world novel discovery is under-explored in videos. In this paper, we are interested in asking: What if atypical unusual videos are exposed in the learning process? To this end, we collect a new video dataset consisting of various types of unusual atypical data (e.g., sci-fi, animation, etc.). To study how such atypical data may benefit open-world learning, we feed them into the model training process for representation learning. Focusing on three key tasks in open-world learning: out-of-distribution (OOD) detection, novel category discovery (NCD), and zero-shot action recognition (ZSAR), we found that even straightforward learning approaches with atypical data consistently improve performance across various settings. Furthermore, we found that increasing the categorical diversity of the atypical samples further boosts OOD detection performance. Additionally, in the NCD task, using a smaller yet more semantically diverse set of atypical samples leads to better performance compared to using a larger but more typical dataset. In the ZSAR setting, the semantic diversity of atypical videos helps the model generalise better to unseen action classes. These observations in our extensive experimental evaluations reveal the benefits of atypical videos for visual representation learning in the open world, together with the newly proposed dataset, encouraging further studies in this direction. The project page is at: https://julysun98.github.io/atypical_dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes