Practical Dataset Distillation Based on Deep Support Vectors
This work addresses a practical problem for machine learning practitioners by making dataset distillation more feasible with partial data, though it appears incremental.
The paper tackles dataset distillation in practical scenarios with limited data access by introducing a novel method that incorporates Deep KKT loss, showing improved performance over baseline methods on CIFAR-10.
Conventional dataset distillation requires significant computational resources and assumes access to the entire dataset, an assumption impractical as it presumes all data resides on a central server. In this paper, we focus on dataset distillation in practical scenarios with access to only a fraction of the entire dataset. We introduce a novel distillation method that augments the conventional process by incorporating general model knowledge via the addition of Deep KKT (DKKT) loss. In practical settings, our approach showed improved performance compared to the baseline distribution matching distillation method on the CIFAR-10 dataset. Additionally, we present experimental evidence that Deep Support Vectors (DSVs) offer unique information to the original distillation, and their integration results in enhanced performance.