LGAICVApr 21, 2024

Elucidating the Design Space of Dataset Condensation

arXiv:2404.13733v437 citationsh-index: 10NIPS
Originality Incremental advance
AI Analysis

This work addresses scalability and design limitations in dataset condensation for machine learning practitioners, offering incremental improvements over existing methods.

The paper tackles the problem of dataset condensation by proposing a new framework that improves training efficiency and achieves state-of-the-art accuracy, such as 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, outperforming previous methods by margins up to 27.3%.

Dataset condensation, a concept within data-centric learning, efficiently transfers critical attributes from an original dataset to a synthetic version, maintaining both diversity and realism. This approach significantly improves model training efficiency and is adaptable across multiple application areas. Previous methods in dataset condensation have faced challenges: some incur high computational costs which limit scalability to larger datasets (e.g., MTT, DREAM, and TESLA), while others are restricted to less optimal design spaces, which could hinder potential improvements, especially in smaller datasets (e.g., SRe2L, G-VBSM, and RDED). To address these limitations, we propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching and adjusting the learning rate schedule. These strategies are grounded in empirical evidence and theoretical backing. Our resulting approach, Elucidate Dataset Condensation (EDC), establishes a benchmark for both small and large-scale dataset condensation. In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%. This performance exceeds those of SRe2L, G-VBSM, and RDED by margins of 27.3%, 17.2%, and 6.6%, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes