CVAug 14, 2025

ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning

arXiv:2508.10896v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses memory-inefficiency in video class-incremental learning, an incremental improvement for video analysis tasks.

The paper tackles video class-incremental learning by proposing ESSENTIAL, which integrates episodic and semantic memory to reduce memory usage while maintaining performance, achieving favorable results on multiple benchmarks with significantly reduced memory.

In this work, we tackle the problem of video classincremental learning (VCIL). Many existing VCIL methods mitigate catastrophic forgetting by rehearsal training with a few temporally dense samples stored in episodic memory, which is memory-inefficient. Alternatively, some methods store temporally sparse samples, sacrificing essential temporal information and thereby resulting in inferior performance. To address this trade-off between memory-efficiency and performance, we propose EpiSodic and SEmaNTIc memory integrAtion for video class-incremental Learning (ESSENTIAL). ESSENTIAL consists of episodic memory for storing temporally sparse features and semantic memory for storing general knowledge represented by learnable prompts. We introduce a novel memory retrieval (MR) module that integrates episodic memory and semantic prompts through cross-attention, enabling the retrieval of temporally dense features from temporally sparse features. We rigorously validate ESSENTIAL on diverse datasets: UCF-101, HMDB51, and Something-Something-V2 from the TCD benchmark and UCF-101, ActivityNet, and Kinetics-400 from the vCLIMB benchmark. Remarkably, with significantly reduced memory, ESSENTIAL achieves favorable performance on the benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes