CVJun 27, 2025

CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan

arXiv:2506.22637v219.012 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses inefficiencies in creating compact surrogate datasets for machine learning, offering an incremental improvement over existing diffusion-based methods.

The paper tackled inconsistencies in diffusion-based dataset distillation, specifically objective and condition mismatches, by introducing CaO$_2$, a two-stage framework that achieved state-of-the-art performance with an average 2.3% accuracy improvement on ImageNet and its subsets.

The recent introduction of diffusion models in dataset distillation has shown promising potential in creating compact surrogate datasets for large, high-resolution target datasets, offering improved efficiency and performance over traditional bi-level/uni-level optimization methods. However, current diffusion-based dataset distillation approaches overlook the evaluation process and exhibit two critical inconsistencies in the distillation process: (1) Objective Inconsistency, where the distillation process diverges from the evaluation objective, and (2) Condition Inconsistency, leading to mismatches between generated images and their corresponding conditions. To resolve these issues, we introduce Condition-aware Optimization with Objective-guided Sampling (CaO$_2$), a two-stage diffusion-based framework that aligns the distillation process with the evaluation objective. The first stage employs a probability-informed sample selection pipeline, while the second stage refines the corresponding latent representations to improve conditional likelihood. CaO$_2$ achieves state-of-the-art performance on ImageNet and its subsets, surpassing the best-performing baselines by an average of 2.3% accuracy.

View on arXiv PDF Code

Similar