Differential Privacy for Network Connectedness Indices
This addresses privacy concerns for social scientists releasing network statistics, though it is incremental as it adapts existing differential privacy methods to a specific setting.
The paper tackles the problem of releasing network connectedness indices while preserving privacy, showing that standard differential privacy techniques fail due to high sensitivity and poor composition. The proposed method adds noise to node attributes, debiases statistics, and applies edge-level noise, proving consistency and asymptotic normality, with effective results on networks as small as 200 nodes.
Researchers increasingly use data on social and economic networks to study a range of social science questions, but releasing statistics derived from networks can raise significant privacy concerns. We show how to release network connectedness indices that quantify assortative mixing across node attributes under edge-adjacent differential privacy. Standard privacy techniques perform poorly in this setting both because connectedness indices have high global sensitivity and because a single node's attribute can potentially be an input to connectedness in thousands of cells, leading to poor composition. Our method, which is straightforward to apply, first adds noise to node attributes, then analytically debiases downstream statistics, and finally applies a second layer of noise to protect the presence or absence of individual edges. We prove consistency and asymptotic normality of our estimators for both discrete and continuous labels and show our method works well in simulations and on real networks with as few as 200 nodes collected by social scientists.