CV AIJan 11, 2025

FocusDD: Real-World Scene Infusion for Robust Dataset Distillation

Youbing Hu, Yun Cheng, Olga Saukh, Firat Ozdemir, Anqi Lu, Zhiqiang Cao, Zhijun Li

arXiv:2501.06405v14 citationsh-index: 7

Originality Highly original

AI Analysis

This addresses the problem of inefficient training for machine learning practitioners by compressing datasets while maintaining performance, representing a novel advancement in dataset distillation.

The paper tackles the challenge of dataset distillation struggling with large-scale, high-resolution datasets by introducing FocusDD, a resolution-independent method that uses key patches to create diverse, realistic distilled data, achieving validation accuracies of 71.0% for ResNet50 and 62.6% for MobileNet-v2 on ImageNet-1K with 100 IPC, outperforming SOTA by 2.8% and 4.7%, and enabling object detection with 24.4% and 32.1% mAP on COCO2017.

Dataset distillation has emerged as a strategy to compress real-world datasets for efficient training. However, it struggles with large-scale and high-resolution datasets, limiting its practicality. This paper introduces a novel resolution-independent dataset distillation method Focus ed Dataset Distillation (FocusDD), which achieves diversity and realism in distilled data by identifying key information patches, thereby ensuring the generalization capability of the distilled dataset across different network architectures. Specifically, FocusDD leverages a pre-trained Vision Transformer (ViT) to extract key image patches, which are then synthesized into a single distilled image. These distilled images, which capture multiple targets, are suitable not only for classification tasks but also for dense tasks such as object detection. To further improve the generalization of the distilled dataset, each synthesized image is augmented with a downsampled view of the original image. Experimental results on the ImageNet-1K dataset demonstrate that, with 100 images per class (IPC), ResNet50 and MobileNet-v2 achieve validation accuracies of 71.0% and 62.6%, respectively, outperforming state-of-the-art methods by 2.8% and 4.7%. Notably, FocusDD is the first method to use distilled datasets for object detection tasks. On the COCO2017 dataset, with an IPC of 50, YOLOv11n and YOLOv11s achieve 24.4% and 32.1% mAP, respectively, further validating the effectiveness of our approach.

View on arXiv PDF

Similar