LG AIMay 5, 2023

A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness

Zongxiong Chen, Jiahui Geng, Derui Zhu, Herbert Woisetschlaeger, Qing Li, Sonja Schimmler, Ruben Mayer, Chunming Rong

arXiv:2305.03355v313.010 citations

Originality Synthesis-oriented

AI Analysis

This work addresses security and fairness concerns in dataset distillation for researchers and practitioners, highlighting risks in an incremental analysis of existing methods.

The study comprehensively evaluated dataset distillation methods, revealing that they still pose privacy risks through membership inference attacks and can negatively impact model robustness and fairness across classes.

The aim of dataset distillation is to encode the rich features of an original dataset into a tiny dataset. It is a promising approach to accelerate neural network training and related studies. Different approaches have been proposed to improve the informativeness and generalization performance of distilled images. However, no work has comprehensively analyzed this technique from a security perspective and there is a lack of systematic understanding of potential risks. In this work, we conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods. We successfully use membership inference attacks to show that privacy risks still remain. Our work also demonstrates that dataset distillation can cause varying degrees of impact on model robustness and amplify model unfairness across classes when making predictions. This work offers a large-scale benchmarking framework for dataset distillation evaluation.

View on arXiv PDF

Similar