Differentially Private Ordinary Least Squares
This work addresses the need for privacy-preserving statistical analysis in fields like social sciences and economics, where OLS is used for explanatory purposes, but it is incremental as it builds on existing differential privacy methods.
The paper tackles the problem of providing statistical inference guarantees, such as confidence intervals and hypothesis testing, for ordinary least squares (OLS) regression under differential privacy, showing that Gaussian Johnson-Lindenstrauss Transform (JLT) approximates t-values well for well-spread data and deriving confidence intervals for Ridge regression and the Analyze Gauss algorithm under certain conditions.
Linear regression is one of the most prevalent techniques in machine learning, however, it is also common to use linear regression for its \emph{explanatory} capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives \emph{$t$-values} --- representing the likelihood of each real value to be the true correlation. Using $t$-values, OLS can release a \emph{confidence interval}, which is an interval on the reals that is likely to contain the true correlation, and when this interval does not intersect the origin, we can \emph{reject the null hypothesis} as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of $t$-values, secondly, when JLT approximates Ridge regression (linear regression with $l_2$-regularization) we derive, under certain conditions, confidence intervals using the projected data, lastly, we derive, under different conditions, confidence intervals for the "Analyze Gauss" algorithm (Dwork et al, STOC 2014).