CRApr 7, 2021

PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

Sara Jafarbeiki, Amin Sakzad, Shabnam Kasra Kermanshahi, Raj Gaire, Ron Steinfeld, Shangqi Lai, Gad Abraham

arXiv:2104.02890v26.61 citations

Originality Incremental advance

AI Analysis

This addresses privacy concerns for biomedical researchers and clinicians handling sensitive genomic data, though it appears incremental as it extends existing SSE techniques to a new data type.

The authors tackled the problem of protecting single nucleotide polymorphism (SNP)-phenotype data in genomic databases by proposing PrivGenDB, a novel model that uses searchable symmetric encryption to securely store and efficiently execute various queries on encrypted data, with evaluations showing query times of approximately 4.3s for count/Boolean queries and 86.4μs for k'-out-of-k match queries over 40 SNPs.

Searchable symmetric encryption (SSE) has been used to protect the confidentiality of genomic data while providing substring search and range queries on a sequence of genomic data, but it has not been studied for protecting single nucleotide polymorphism (SNP)-phenotype data. In this article, we propose a novel model, PrivGenDB, for securely storing and efficiently conducting different queries on genomic data outsourced to an honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure confidentiality while conducting different types of queries on encrypted genomic data, phenotype and other information of individuals to help analysts/clinicians in their analysis/care. To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of shared SNP-phenotype data through encryption while making the computation/query process efficient and scalable for biomedical research and care. Furthermore, it supports a variety of query types on genomic data, including count queries, Boolean queries, and k'-out-of-k match queries. Finally, the PrivGenDB model handles the dataset containing both genotype and phenotype, and it also supports storing and managing other metadata like gender and ethnicity privately. Computer evaluations on a dataset with 5,000 records and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, that outperforms the existing schemes.

View on arXiv PDF

Similar