PRIMO: Private Regression in Multiple Outcomes
This work addresses privacy-preserving data analysis for scenarios like genomic studies where multiple regressions are needed, offering a scalable solution with reduced error compared to incremental approaches.
The authors tackled the problem of performing multiple private linear regressions with shared features but different outcomes, where naive methods cause error to increase with the number of regressions. They developed scalable algorithms, such as projection-based methods, that eliminate dependence on the number of regressions in asymptotic error and improve accuracy in genomic risk prediction tasks.
We introduce a new private regression setting we call Private Regression in Multiple Outcomes (PRIMO), inspired by the common situation where a data analyst wants to perform a set of $l$ regressions while preserving privacy, where the features $X$ are shared across all $l$ regressions, and each regression $i \in [l]$ has a different vector of outcomes $y_i$. Naively applying existing private linear regression techniques $l$ times leads to a $\sqrt{l}$ multiplicative increase in error over the standard linear regression setting. We apply a variety of techniques including sufficient statistics perturbation (SSP) and geometric projection-based methods to develop scalable algorithms that outperform this baseline across a range of parameter regimes. In particular, we obtain no dependence on l in the asymptotic error when $l$ is sufficiently large. Empirically, on the task of genomic risk prediction with multiple phenotypes we find that even for values of $l$ far smaller than the theory would predict, our projection-based method improves the accuracy relative to the variant that doesn't use the projection.