vCLIMB: A Novel Video Class Incremental Learning Benchmark
This work provides a standardized benchmark for video continual learning, addressing domain-specific challenges like frame-level memory selection and untrimmed data, but it is incremental as it builds on existing memory-based methods.
The authors tackled the lack of a standardized benchmark for video continual learning by introducing vCLIMB, which addresses issues like imbalanced class distributions, and proposed a temporal consistency regularization method that improved performance by up to 24% on untrimmed tasks.
Continual learning (CL) is under-explored in the video domain. The few existing works contain splits with imbalanced class distributions over the tasks, or study the problem in unsuitable datasets. We introduce vCLIMB, a novel video continual learning benchmark. vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning. In contrast to previous work, we focus on class incremental continual learning with models trained on a sequence of disjoint tasks, and distribute the number of classes uniformly across the tasks. We perform in-depth evaluations of existing CL methods in vCLIMB, and observe two unique challenges in video data. The selection of instances to store in episodic memory is performed at the frame level. Second, untrimmed training data influences the effectiveness of frame sampling strategies. We address these two challenges by proposing a temporal consistency regularization that can be applied on top of memory-based continual learning methods. Our approach significantly improves the baseline, by up to 24% on the untrimmed continual learning task.