Label-Consistent Dataset Distillation with Detector-Guided Refinement
This addresses dataset distillation for reducing storage and computational demands in machine learning, though it appears to be an incremental improvement over existing diffusion-based approaches.
The paper tackles the problem of label inconsistencies and insufficient detail in diffusion-based dataset distillation by proposing a detector-guided framework that identifies and refines anomalous synthetic samples. The method achieves state-of-the-art performance on validation sets, producing higher-quality images with richer details.
Dataset distillation (DD) aims to generate a compact yet informative dataset that achieves performance comparable to the original dataset, thereby reducing demands on storage and computational resources. Although diffusion models have made significant progress in dataset distillation, the generated surrogate datasets often contain samples with label inconsistencies or insufficient structural detail, leading to suboptimal downstream performance. To address these issues, we propose a detector-guided dataset distillation framework that explicitly leverages a pre-trained detector to identify and refine anomalous synthetic samples, thereby ensuring label consistency and improving image quality. Specifically, a detector model trained on the original dataset is employed to identify anomalous images exhibiting label mismatches or low classification confidence. For each defective image, multiple candidates are generated using a pre-trained diffusion model conditioned on the corresponding image prototype and label. The optimal candidate is then selected by jointly considering the detector's confidence score and dissimilarity to existing qualified synthetic samples, thereby ensuring both label accuracy and intra-class diversity. Experimental results demonstrate that our method can synthesize high-quality representative images with richer details, achieving state-of-the-art performance on the validation set.