LG CRNov 27, 2021

Towards Understanding the Impact of Model Size on Differential Private Classification

Yinchen Shen, Zhiguo Wang, Ruoyu Sun, Xiaojing Shen

arXiv:2111.13895v19.216 citations

Originality Incremental advance

AI Analysis

This addresses a practical problem for practitioners using differential privacy in machine learning, though it appears incremental as it builds on known observations about model size effects.

The authors investigated why larger models perform worse than smaller ones in differentially private classification, showing theoretically that with sufficient dimensionality and DP noise, classification error approaches random guessing, and proposed a feature selection method that improves performance on real data.

Differential privacy (DP) is an essential technique for privacy-preserving. It was found that a large model trained for privacy preserving performs worse than a smaller model (e.g. ResNet50 performs worse than ResNet18). To better understand this phenomenon, we study high dimensional DP learning from the viewpoint of generalization. Theoretically, we show that for the simple Gaussian model with even small DP noise, if the dimension is large enough, then the classification error can be as bad as the random guessing. Then we propose a feature selection method to reduce the size of the model, based on a new metric which trades off the classification accuracy and privacy preserving. Experiments on real data support our theoretical results and demonstrate the advantage of the proposed method.

View on arXiv PDF

Similar