Multivariate Mean Comparison under Differential Privacy
This addresses privacy concerns in statistical inference for participants in studies, preventing data distortion due to concealment, though it is incremental as it adapts existing methods with privacy guarantees.
The paper tackles the problem of comparing multivariate population means while protecting individual privacy, developing a differentially private hypothesis test based on Hotelling's t²-statistic and a bootstrap algorithm to control type-1 error, with empirical validation showing applicability.
The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals' privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this problem by developing a hypothesis test for multivariate mean comparisons that guarantees differential privacy to users. The test statistic is based on the popular Hotelling's $t^2$-statistic, which has a natural interpretation in terms of the Mahalanobis distance. In order to control the type-1-error, we present a bootstrap algorithm under differential privacy that provably yields a reliable test decision. In an empirical study we demonstrate the applicability of this approach.