Better Private Linear Regression Through Better Private Feature Selection
This work addresses practical usability issues in private linear regression for data analysts, though it appears incremental as it extends existing algorithms rather than introducing a fundamentally new paradigm.
The paper tackles the problem of differentially private linear regression in high-dimensional settings where users struggle to set data bounds without violating privacy, by introducing a private feature selection method based on Kendall rank correlation. Experiments across 25 datasets show this approach significantly broadens applicability with little additional privacy or computational cost.
Existing work on differentially private linear regression typically assumes that end users can precisely set data bounds or algorithmic hyperparameters. End users often struggle to meet these requirements without directly examining the data (and violating privacy). Recent work has attempted to develop solutions that shift these burdens from users to algorithms, but they struggle to provide utility as the feature dimension grows. This work extends these algorithms to higher-dimensional problems by introducing a differentially private feature selection method based on Kendall rank correlation. We prove a utility guarantee for the setting where features are normally distributed and conduct experiments across 25 datasets. We find that adding this private feature selection step before regression significantly broadens the applicability of ``plug-and-play'' private linear regression algorithms at little additional cost to privacy, computation, or decision-making by the end user.