CVMay 19, 2025

DD-Ranking: Rethinking the Evaluation of Dataset Distillation

Zekai Li, Xinhao Zhong, Samir Khaki, Zhiyuan Liang, Yuhao Zhou, Mingjia Shi, Ziqiao Wang, Xuanlei Zhao, Wangbo Zhao, Ziheng Qin, Mengxuan Wu, Pengfei Zhou

arXiv:2505.13300v319.011 citationsh-index: 46Has Code

Originality Synthesis-oriented

AI Analysis

This addresses a critical evaluation misalignment in dataset distillation research, which is incremental but essential for advancing the field by ensuring fair comparisons.

The paper tackles the problem of unreliable accuracy metrics in evaluating dataset distillation methods, finding that performance gains often come from additional techniques rather than distilled image quality, and proposes DD-Ranking as a unified framework with new metrics to provide fairer evaluation.

In recent years, dataset distillation has provided a reliable solution for data compression, where models trained on the resulting smaller synthetic datasets achieve performance comparable to those trained on the original datasets. To further improve the performance of synthetic datasets, various training pipelines and optimization objectives have been proposed, greatly advancing the field of dataset distillation. Recent decoupled dataset distillation methods introduce soft labels and stronger data augmentation during the post-evaluation phase and scale dataset distillation up to larger datasets (e.g., ImageNet-1K). However, this raises a question: Is accuracy still a reliable metric to fairly evaluate dataset distillation methods? Our empirical findings suggest that the performance improvements of these methods often stem from additional techniques rather than the inherent quality of the images themselves, with even randomly sampled images achieving superior results. Such misaligned evaluation settings severely hinder the development of DD. Therefore, we propose DD-Ranking, a unified evaluation framework, along with new general evaluation metrics to uncover the true performance improvements achieved by different methods. By refocusing on the actual information enhancement of distilled datasets, DD-Ranking provides a more comprehensive and fair evaluation standard for future research advancements.

View on arXiv PDF Code

Similar