MECRMay 3, 2012

Privacy-Preserving Data Sharing for Genome-Wide Association Studies

arXiv:1205.0739v1159 citations
Originality Incremental advance
AI Analysis

This addresses privacy concerns for genetic researchers and participants in GWAS, but is incremental as it builds on existing differential privacy concepts.

The paper tackles the problem of sharing genome-wide association study (GWAS) data while preserving individual privacy, proposing new differentially private methods to release aggregate statistics like minor allele frequencies and p-values, and demonstrates them on a study of 685 dogs.

Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS (genome-wide association studies) databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual's privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes