Rajagopal Venkatesaramani

CR
4papers
54citations
Novelty46%
AI Score39

4 Papers

CRJan 11, 2023
Enabling Trade-offs in Privacy and Utility in Genomic Data Beacons and Summary Statistics

Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin et al.

The collection and sharing of genomic data are becoming increasingly commonplace in research, clinical, and direct-to-consumer settings. The computational protocols typically adopted to protect individual privacy include sharing summary statistics, such as allele frequencies, or limiting query responses to the presence/absence of alleles of interest using web-services called Beacons. However, even such limited releases are susceptible to likelihood-ratio-based membership-inference attacks. Several approaches have been proposed to preserve privacy, which either suppress a subset of genomic variants or modify query responses for specific variants (e.g., adding noise, as in differential privacy). However, many of these approaches result in a significant utility loss, either suppressing many variants or adding a substantial amount of noise. In this paper, we introduce optimization-based approaches to explicitly trade off the utility of summary data or Beacon responses and privacy with respect to membership-inference attacks based on likelihood-ratios, combining variant suppression and modification. We consider two attack models. In the first, an attacker applies a likelihood-ratio test to make membership-inference claims. In the second model, an attacker uses a threshold that accounts for the effect of the data release on the separation in scores between individuals in the dataset and those who are not. We further introduce highly scalable approaches for approximately solving the privacy-utility tradeoff problem when information is either in the form of summary statistics or presence/absence queries. Finally, we show that the proposed approaches outperform the state of the art in both utility and privacy through an extensive evaluation with public datasets.

84.2AIApr 1
Therefore I am. I Think

Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov et al.

We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.

CRDec 25, 2021
Defending Against Membership Inference Attacks on Beacon Services

Rajagopal Venkatesaramani, Zhiyu Wan, Bradley A. Malin et al.

Large genomic datasets are now created through numerous activities, including recreational genealogical investigations, biomedical research, and clinical care. At the same time, genomic data has become valuable for reuse beyond their initial point of collection, but privacy concerns often hinder access. Over the past several years, Beacon services have emerged to broaden accessibility to such data. These services enable users to query for the presence of a particular minor allele in a private dataset, information that can help care providers determine if genomic variation is spurious or has some known clinical indication. However, various studies have shown that even this limited access model can leak if individuals are members in the underlying dataset. Several approaches for mitigating this vulnerability have been proposed, but they are limited in that they 1) typically rely on heuristics and 2) offer probabilistic privacy guarantees, but neglect utility. In this paper, we present a novel algorithmic framework to ensure privacy in a Beacon service setting with a minimal number of query response flips (e.g., changing a positive response to a negative). Specifically, we represent this problem as combinatorial optimization in both the batch setting (where queries arrive all at once), as well as the online setting (where queries arrive sequentially). The former setting has been the primary focus in prior literature, whereas real Beacons allow sequential queries, motivating the latter investigation. We present principled algorithms in this framework with both privacy and, in some cases, worst-case utility guarantees. Moreover, through an extensive experimental evaluation, we show that the proposed approaches significantly outperform the state of the art in terms of privacy and utility.

LGFeb 17, 2021
Re-identification of Individuals in Genomic Datasets Using Public Face Images

Rajagopal Venkatesaramani, Bradley A. Malin, Yevgeniy Vorobeychik

DNA sequencing is becoming increasingly commonplace, both in medical and direct-to-consumer settings. To promote discovery, collected genomic data is often de-identified and shared, either in public repositories, such as OpenSNP, or with researchers through access-controlled repositories. However, recent studies have suggested that genomic data can be effectively matched to high-resolution three-dimensional face images, which raises a concern that the increasingly ubiquitous public face images can be linked to shared genomic data, thereby re-identifying individuals in the genomic data. While these investigations illustrate the possibility of such an attack, they assume that those performing the linkage have access to extremely well-curated data. Given that this is unlikely to be the case in practice, it calls into question the pragmatic nature of the attack. As such, we systematically study this re-identification risk from two perspectives: first, we investigate how successful such linkage attacks can be when real face images are used, and second, we consider how we can empower individuals to have better control over the associated re-identification risk. We observe that the true risk of re-identification is likely substantially smaller for most individuals than prior literature suggests. In addition, we demonstrate that the addition of a small amount of carefully crafted noise to images can enable a controlled trade-off between re-identification success and the quality of shared images, with risk typically significantly lowered even with noise that is imperceptible to humans.