LG CRNov 4, 2024

R+R:Understanding Hyperparameter Effects in DP-SGD

Felix Morsbach, Jan Reubold, Thorsten Strufe

arXiv:2411.02051v14.61 citationsh-index: 3ACSAC

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of optimizing hyperparameters for better privacy-utility trade-offs in DP-SGD, which is incremental as it synthesizes and tests existing conjectures rather than introducing new methods.

The authors tackled the problem of inconsistent and anecdotal claims about hyperparameter effects in DP-SGD by conducting a replication study across datasets, models, and privacy budgets, finding that they could not consistently replicate conjectures about batch size and epochs but did replicate the relationship between clipping threshold and learning rate, quantifying its importance.

Research on the effects of essential hyperparameters of DP-SGD lacks consensus, verification, and replication. Contradictory and anecdotal statements on their influence make matters worse. While DP-SGD is the standard optimization algorithm for privacy-preserving machine learning, its adoption is still commonly challenged by low performance compared to non-private learning approaches. As proper hyperparameter settings can improve the privacy-utility trade-off, understanding the influence of the hyperparameters promises to simplify their optimization towards better performance, and likely foster acceptance of private learning. To shed more light on these influences, we conduct a replication study: We synthesize extant research on hyperparameter influences of DP-SGD into conjectures, conduct a dedicated factorial study to independently identify hyperparameter effects, and assess which conjectures can be replicated across multiple datasets, model architectures, and differential privacy budgets. While we cannot (consistently) replicate conjectures about the main and interaction effects of the batch size and the number of epochs, we were able to replicate the conjectured relationship between the clipping threshold and learning rate. Furthermore, we were able to quantify the significant importance of their combination compared to the other hyperparameters.

View on arXiv PDF

Similar