LGMay 23, 2023

On the Size and Approximation Error of Distilled Sets

Alaa Maalouf, Murad Tukan, Noel Loo, Ramin Hasani, Mathias Lechner, Daniela Rus

arXiv:2305.14113v17.74 citations

Originality Highly original

AI Analysis

This provides foundational theoretical guarantees for dataset distillation, addressing a gap in understanding for researchers in machine learning compression and efficiency.

The paper tackles the theoretical limitations of dataset distillation by proving the existence of small distilled datasets with bounded excess risk for kernel ridge regression using shift-invariant kernels, showing that the distilled set size is linear in the dimension of random Fourier features space and the error depends on the regularization parameter.

Dataset Distillation is the task of synthesizing small datasets from large ones while still retaining comparable predictive accuracy to the original uncompressed dataset. Despite significant empirical progress in recent years, there is little understanding of the theoretical limitations/guarantees of dataset distillation, specifically, what excess risk is achieved by distillation compared to the original dataset, and how large are distilled datasets? In this work, we take a theoretical view on kernel ridge regression (KRR) based methods of dataset distillation such as Kernel Inducing Points. By transforming ridge regression in random Fourier features (RFF) space, we provide the first proof of the existence of small (size) distilled datasets and their corresponding excess risk for shift-invariant kernels. We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data. We further show that a KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data. The size of this set is linear in the dimension of the RFF space of the input set or alternatively near linear in the number of effective degrees of freedom, which is a function of the kernel, number of datapoints, and the regularization parameter $λ$. The error bound of this distilled set is also a function of $λ$. We verify our bounds analytically and empirically.

View on arXiv PDF

Similar