CV AI LGNov 18, 2024

Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning

Brian B. Moser, Federico Raue, Tobias C. Nauen, Stanislav Frolov, Andreas Dengel

arXiv:2411.12115v17.66 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses a challenge in dataset distillation for improving generalization across architectures, though it appears incremental as it builds on existing distillation techniques.

The paper tackles the problem of dataset distillation by introducing a pruning framework that removes non-beneficial samples before distillation, resulting in up to a 5.2 percentage points accuracy increase even after pruning 80% of the dataset.

Dataset distillation has gained significant interest in recent years, yet existing approaches typically distill from the entire dataset, potentially including non-beneficial samples. We introduce a novel "Prune First, Distill After" framework that systematically prunes datasets via loss-based sampling prior to distillation. By leveraging pruning before classical distillation techniques and generative priors, we create a representative core-set that leads to enhanced generalization for unseen architectures - a significant challenge of current distillation methods. More specifically, our proposed framework significantly boosts distilled quality, achieving up to a 5.2 percentage points accuracy increase even with substantial dataset pruning, i.e., removing 80% of the original dataset prior to distillation. Overall, our experimental results highlight the advantages of our easy-sample prioritization and cross-architecture robustness, paving the way for more effective and high-quality dataset distillation.

View on arXiv PDF Code

Similar