CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation
This work addresses inefficiencies in creating compact surrogate datasets for machine learning, offering an incremental improvement over existing diffusion-based methods.
The paper tackled inconsistencies in diffusion-based dataset distillation, specifically objective and condition mismatches, by introducing CaO$_2$, a two-stage framework that achieved state-of-the-art performance with an average 2.3% accuracy improvement on ImageNet and its subsets.
The recent introduction of diffusion models in dataset distillation has shown promising potential in creating compact surrogate datasets for large, high-resolution target datasets, offering improved efficiency and performance over traditional bi-level/uni-level optimization methods. However, current diffusion-based dataset distillation approaches overlook the evaluation process and exhibit two critical inconsistencies in the distillation process: (1) Objective Inconsistency, where the distillation process diverges from the evaluation objective, and (2) Condition Inconsistency, leading to mismatches between generated images and their corresponding conditions. To resolve these issues, we introduce Condition-aware Optimization with Objective-guided Sampling (CaO$_2$), a two-stage diffusion-based framework that aligns the distillation process with the evaluation objective. The first stage employs a probability-informed sample selection pipeline, while the second stage refines the corresponding latent representations to improve conditional likelihood. CaO$_2$ achieves state-of-the-art performance on ImageNet and its subsets, surpassing the best-performing baselines by an average of 2.3% accuracy.