LGMLJun 15, 2020

Flexible Dataset Distillation: Learn Labels Instead of Images

arXiv:2006.08572v3124 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient dataset creation for machine learning practitioners, offering a more flexible and compatible approach, though it appears incremental as it builds on prior distillation work.

The paper tackles dataset distillation by proposing label distillation, which creates synthetic labels for a small set of real images instead of synthetic images, showing it is more effective than prior image-based methods with improved results and flexibility.

We study the problem of dataset distillation - creating a small set of synthetic examples capable of training a good model. In particular, we study the problem of label distillation - creating synthetic labels for a small set of real images, and show it to be more effective than the prior image-based approach to dataset distillation. Methodologically, we introduce a more robust and flexible meta-learning algorithm for distillation, as well as an effective first-order strategy based on convex optimization layers. Distilling labels with our new algorithm leads to improved results over prior image-based distillation. More importantly, it leads to clear improvements in flexibility of the distilled dataset in terms of compatibility with off-the-shelf optimizers and diverse neural architectures. Interestingly, label distillation can also be applied across datasets, for example enabling learning Japanese character recognition by training only on synthetically labeled English letters.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes