CRFeb 10, 2022
Collaborative analysis of genomic data: vision and challengesSara Jafarbeiki, Raj Gaire, Amin Sakzad et al.
The cost of DNA sequencing has resulted in a surge of genetic data being utilised to improve scientific research, clinical procedures, and healthcare delivery in recent years. Since the human genome can uniquely identify an individual, this characteristic also raises security and privacy concerns. In order to balance the risks and benefits, governance mechanisms including regulatory and ethical controls have been established, which are prone to human errors and create hindrance for collaboration. Over the past decade, technological methods are also catching up that can support critical discoveries responsibly. In this paper, we explore regulations and ethical guidelines and propose our visions of secure/private genomic data storage/processing/sharing platforms. Then, we present some available techniques and a conceptual system model that can support our visions. Finally, we highlight the open issues that need further investigation.
CRApr 7, 2021
PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype databaseSara Jafarbeiki, Amin Sakzad, Shabnam Kasra Kermanshahi et al.
Searchable symmetric encryption (SSE) has been used to protect the confidentiality of genomic data while providing substring search and range queries on a sequence of genomic data, but it has not been studied for protecting single nucleotide polymorphism (SNP)-phenotype data. In this article, we propose a novel model, PrivGenDB, for securely storing and efficiently conducting different queries on genomic data outsourced to an honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure confidentiality while conducting different types of queries on encrypted genomic data, phenotype and other information of individuals to help analysts/clinicians in their analysis/care. To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of shared SNP-phenotype data through encryption while making the computation/query process efficient and scalable for biomedical research and care. Furthermore, it supports a variety of query types on genomic data, including count queries, Boolean queries, and k'-out-of-k match queries. Finally, the PrivGenDB model handles the dataset containing both genotype and phenotype, and it also supports storing and managing other metadata like gender and ethnicity privately. Computer evaluations on a dataset with 5,000 records and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, that outperforms the existing schemes.
CRJan 7, 2020
Towards Practical Encrypted Network Traffic Pattern Matching for Secure MiddleboxesShangqi Lai, Xingliang Yuan, Shi-Feng Sun et al.
Network Function Virtualisation (NFV) advances the adoption of composable software middleboxes. Accordingly, cloud data centres become major NFV vendors for enterprise traffic processing. Due to the privacy concern of traffic redirection to the cloud, secure middlebox systems (e.g., BlindBox) draw much attention; they can process encrypted packets against encrypted rules directly. However, most of the existing systems supporting pattern matching based network functions require the enterprise gateway to tokenise packet payloads via sliding windows. Such tokenisation induces a considerable communication overhead, which can be over 100$\times$ to the packet size. To overcome this bottleneck, in this paper, we propose the first bandwidth-efficient encrypted pattern matching protocol for secure middleboxes. We resort to a primitive called symmetric hidden vector encryption (SHVE), and propose a variant of it, aka SHVE+, to achieve constant and moderate communication cost. To speed up, we devise encrypted filters to reduce the number of accesses to SHVE+ during matching highly. We formalise the security of our proposed protocol and conduct comprehensive evaluations over real-world rulesets and traffic dumps. The results show that our design can inspect a packet over 20k rules within 100 $μ$s. Compared to prior work, it brings a saving of $94\%$ in bandwidth consumption.
CRNov 14, 2019
Enabling Efficient Privacy-Assured Outlier Detection over Encrypted Incremental DatasetsShangqi Lai, Xingliang Yuan, Amin Sakzad et al.
Outlier detection is widely used in practice to track the anomaly on incremental datasets such as network traffic and system logs. However, these datasets often involve sensitive information, and sharing the data to third parties for anomaly detection raises privacy concerns. In this paper, we present a privacy-preserving outlier detection protocol (PPOD) for incremental datasets. The protocol decomposes the outlier detection algorithm into several phases and recognises the necessary cryptographic operations in each phase. It realises several cryptographic modules via efficient and interchangeable protocols to support the above cryptographic operations and composes them in the overall protocol to enable outlier detection over encrypted datasets. To support efficient updates, it integrates the sliding window model to periodically evict the expired data in order to maintain a constant update time. We build a prototype of PPOD and systematically evaluates the cryptographic modules and the overall protocols under various parameter settings. Our results show that PPOD can handle encrypted incremental datasets with a moderate computation and communication cost.