Generalizing Dataset Distillation via Deep Generative Prior
This work addresses a key limitation in dataset distillation for machine learning practitioners, though it is incremental as it builds on existing techniques.
The paper tackles the problem of dataset distillation failing to generalize across architectures and scale to high-resolution datasets by using pre-trained deep generative models to synthesize distilled data, resulting in significant improvements in cross-architecture generalization.
Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. Despite recent progress in the field, existing dataset distillation methods fail to generalize to new architectures and scale to high-resolution datasets. To overcome the above issues, we propose to use the learned prior from pre-trained deep generative models to synthesize the distilled data. To achieve this, we present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space. Our method augments existing techniques, significantly improving cross-architecture generalization in all settings.