CVLGFeb 18

Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding

arXiv:2602.16545v1h-index: 1
AI Analysis

This addresses the costly need for new annotations and retraining in video understanding as tasks evolve, though it is incremental as it builds on existing classifier structures.

The paper tackles the problem of video recognition models being limited to coarse taxonomies by introducing category splitting, a task to edit classifiers to refine categories into finer subcategories without retraining, and shows that their zero-shot method improves accuracy on split categories without performance loss elsewhere.

Video recognition models are typically trained on fixed taxonomies which are often too coarse, collapsing distinctions in object, manner or outcome under a single label. As tasks and definitions evolve, such models cannot accommodate emerging distinctions and collecting new annotations and retraining to accommodate such changes is costly. To address these challenges, we introduce category splitting, a new task where an existing classifier is edited to refine a coarse category into finer subcategories, while preserving accuracy elsewhere. We propose a zero-shot editing method that leverages the latent compositional structure of video classifiers to expose fine-grained distinctions without additional data. We further show that low-shot fine-tuning, while simple, is highly effective and benefits from our zero-shot initialization. Experiments on our new video benchmarks for category splitting demonstrate that our method substantially outperforms vision-language baselines, improving accuracy on the newly split categories without sacrificing performance on the rest. Project page: https://kaitingliu.github.io/Category-Splitting/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes