On the Computational Complexity of Private High-dimensional Model Selection
This work addresses privacy-preserving model selection for high-dimensional data, representing an incremental improvement by combining existing methods with computational enhancements.
The authors tackled the problem of high-dimensional sparse linear regression model selection under privacy constraints by proposing a differentially private best subset selection method, achieving polynomial mixing time for computational efficiency and demonstrating effective feature identification in simulations.
We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private (DP) best subset selection method with strong statistical utility properties by adopting the well-known exponential mechanism for selecting the best model. To achieve computational expediency, we propose an efficient Metropolis-Hastings algorithm and under certain regularity conditions, we establish that it enjoys polynomial mixing time to its stationary distribution. As a result, we also establish both approximate differential privacy and statistical utility for the estimates of the mixed Metropolis-Hastings chain. Finally, we perform some illustrative experiments on simulated data showing that our algorithm can quickly identify active features under reasonable privacy budget constraints.