CRAug 29, 2019

How Much Does GenoGuard Really "Guard"? An Empirical Analysis of Long-Term Security for Genomic Data

Bristena Oprisanu, Christophe Dessimoz, Emiliano De Cristofaro

arXiv:1908.11315v12.7

Originality Incremental advance

AI Analysis

This work highlights critical vulnerabilities in the only available long-term security tool for genomic data, which is crucial for protecting sensitive hereditary information.

The paper analyzed the real-world security of GenoGuard, a tool for long-term genomic data encryption, finding that adversaries with partial sequence information can significantly improve their ability to determine the full sequence, achieving up to 15% better accuracy than state-of-the-art methods in some cases.

Due to its hereditary nature, genomic data is not only linked to its owner but to that of close relatives as well. As a result, its sensitivity does not really degrade over time; in fact, the relevance of a genomic sequence is likely to be longer than the security provided by encryption. This prompts the need for specialized techniques providing long-term security for genomic data, yet the only available tool for this purpose is GenoGuard (Huang et al., 2015). By relying on Honey Encryption, GenoGuard is secure against an adversary that can brute force all possible keys; i.e., whenever an attacker tries to decrypt using an incorrect password, she will obtain an incorrect but plausible looking decoy sequence. In this paper, we set to analyze the real-world security guarantees provided by GenoGuard; specifically, assess how much more information does access to a ciphertext encrypted using GenoGuard yield, compared to one that was not. Overall, we find that, if the adversary has access to side information in the form of partial information from the target sequence, the use of GenoGuard does appreciably increase her power in determining the rest of the sequence. We show that, in the case of a sequence encrypted using an easily guessable (low-entropy) password, the adversary is able to rule out most decoy sequences, and obtain the target sequence with just 2.5\% of it available as side information. In the case of a harder-to-guess (high-entropy) password, we show that the adversary still obtains, on average, better accuracy in guessing the rest of the target sequences than using state-of-the-art genomic sequence inference methods, obtaining up to 15% improvement in accuracy.

View on arXiv PDF

Similar