CVLGJun 10, 2020

Dataset Condensation with Gradient Matching

arXiv:2006.05929v3703 citations
AI Analysis

This addresses the problem of data efficiency for machine learning practitioners by enabling training on condensed datasets, though it is incremental as it builds on gradient matching techniques.

The paper tackles the problem of expensive dataset storage and model training by proposing Dataset Condensation, a method that condenses large datasets into small sets of synthetic samples, and demonstrates it significantly outperforms state-of-the-art methods in computer vision benchmarks.

As the state-of-the-art machine learning methods in many fields rely on larger datasets, storing datasets and training models on them become significantly more expensive. This paper proposes a training set synthesis technique for data-efficient learning, called Dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch. We formulate this goal as a gradient matching problem between the gradients of deep neural network weights that are trained on the original and our synthetic data. We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods. Finally we explore the use of our method in continual learning and neural architecture search and report promising gains when limited memory and computations are available.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes