LGCVOCMLJan 13, 2025

Dataset Distillation as Pushforward Optimal Quantization

arXiv:2501.07681v21 citationsh-index: 1
Originality Highly original
AI Analysis

This work improves dataset distillation efficiency for machine learning practitioners by providing a faster and more scalable method with strong empirical results.

The paper tackles dataset distillation by reformulating disentangled methods as an optimal quantization problem, achieving state-of-the-art performance on ImageNet-1K with better inter-model generalization and trivial additional computation.

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be broadly categorized as either bi-level optimization problems that have neural network training heuristics as the lower level problem, or disentangled methods that bypass the bi-level optimization by matching distributions of data. The latter method has the major advantages of speed and scalability in terms of size of both training and distilled datasets. We demonstrate that when equipped with an encoder-decoder structure, the empirically successful disentangled methods can be reformulated as an optimal quantization problem, where a finite set of points is found to approximate the underlying probability measure by minimizing the expected projection distance. In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems, demonstrating consistency of distilled datasets for diffusion-based generative priors. We propose Dataset Distillation by Optimal Quantization, based on clustering in a latent space. Compared to the previous SOTA method D\textsuperscript{4}M, we achieve better performance and inter-model generalization on the ImageNet-1K dataset with trivial additional computation, and SOTA performance in higher image-per-class settings. Using the distilled noise initializations in a stronger diffusion transformer model, we obtain SOTA distillation performance on ImageNet-1K and its subsets, outperforming diffusion guidance methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes