CVLGMay 27, 2025

Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets

arXiv:2505.20694v11 citationsh-index: 4
Originality Highly original
AI Analysis

This work addresses the problem of scalable video dataset compression for machine learning researchers, offering a novel method to reduce computational costs while maintaining performance, though it is incremental as it builds on existing dataset distillation paradigms.

The paper tackled the challenge of extending dataset distillation to videos by proposing a temporal saliency-guided framework that optimizes synthetic videos to preserve temporal dynamics, achieving state-of-the-art performance on standard benchmarks and bridging the gap between real and distilled video data.

Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video data. Existing video distillation (VD) methods often suffer from excessive computational costs and struggle to preserve temporal dynamics, as naïve extensions of image-based approaches typically lead to degraded performance. In this paper, we propose a novel uni-level video dataset distillation framework that directly optimizes synthetic videos with respect to a pre-trained model. To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism that leverages inter-frame differences to guide the distillation process, encouraging the retention of informative temporal cues while suppressing frame-level redundancy. Extensive experiments on standard video benchmarks demonstrate that our method achieves state-of-the-art performance, bridging the gap between real and distilled video data and offering a scalable solution for video dataset compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes