3.1CRApr 26
LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language ModelsKato Mivule
This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models (LLMs). We propose LLM-CEG, a systematic framework that employs membership inference attack (MIA) success rates as an empirical privacy gauge and model perplexity as a utility gauge, iteratively adjusting differential privacy parameters until both thresholds are jointly satisfied. A proof-of-concept prototype fine-tunes DistilGPT-2 on a synthetic clinical PII dataset under four privacy regimes using DP-SGD. Results indicate that DP-SGD reduces MIA attacker advantage by 71.5% while simultaneously improving out-of-distribution utility by 47-50% relative to the overfitted baseline, suggesting that differential privacy may act as implicit regularization under narrow fine-tuning conditions. We further extend the SIED engineering framework to the LLM context as LLM-SIED, providing an auditable, regulator-aligned process for privacy-compliant LLM deployment.
CRMay 21, 2014
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
Genomic data provides clinical researchers with vast opportunities to study various patient ailments. Yet the same data contains revealing information, some of which a patient might want to remain concealed. The question then arises: how can an entity transact in full DNA data while concealing certain sensitive pieces of information in the genome sequence, and maintain DNA data utility? As a response to this question, we propose a codon frequency obfuscation heuristic, in which a redistribution of codon frequency values with highly expressed genes is done in the same amino acid group, generating an obfuscated DNA sequence. Our preliminary results show that it might be possible to publish an obfuscated DNA sequence with a desired level of similarity (utility) to the original DNA sequence.
CRSep 25, 2013
SIED, a Data Privacy Engineering FrameworkKato Mivule
While a number of data privacy techniques have been proposed in the recent years, a few frameworks have been suggested for the implementation of the data privacy process. Most of the proposed approaches are tailored towards implementing a specific data privacy algorithm but not the overall data privacy engineering and design process. Therefore, as a contribution, this study proposes SIED (Specification, Implementation, Evaluation, and Dissemination), a conceptual framework that takes a holistic approach to the data privacy engineering procedure by looking at the specifications, implementation, evaluation, and finally, dissemination of the privatized data sets.
CRSep 16, 2013
An Investigation of Data Privacy and Utility Preservation using KNN Classification as a GaugeKato Mivule, Claude Turner
It is obligatory that organizations by law safeguard the privacy of individuals when handling data sets containing personal identifiable information (PII). Nevertheless, during the process of data privatization, the utility or usefulness of the privatized data diminishes. Yet achieving the optimal balance between data privacy and utility needs has been documented as an NP-hard challenge. In this study, we investigate data privacy and utility preservation using KNN machine learning classification as a gauge.
CRSep 16, 2013
Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
The internet is increasingly becoming a standard for both the production and consumption of data while at the same time cyber-crime involving the theft of private data is growing. Therefore in efforts to securely transact in data, privacy and security concerns must be taken into account to ensure that the confidentiality of individuals and entities involved is not compromised, and that the data published is compliant to privacy laws. In this paper, we take a look at noise addition as one of the data privacy providing techniques. Our endeavor in this overview is to give a foundational perspective on noise addition data privacy techniques, provide statistical consideration for noise addition techniques and look at the current state of the art in the field, while outlining future areas of research.
CRSep 16, 2013
A Review of Privacy Essentials for Confidential Mobile Data TransactionsKato Mivule, Claude Turner
The increasingly rapid use of mobile devices for data transaction around the world has consequently led to a new problem, and that is, how to engage in mobile data transactions while maintaining an acceptable level of data privacy and security. While most mobile devices engage in data transactions through a data cloud or a set of data servers, it is still possible to apply data confidentiality across data servers, and, as such, preserving privacy in any mobile data transaction. Yet still, it is essential that a review of data privacy, data utility, the techniques, and methodologies employed in the data privacy process, is done, as the underlying data privacy principles remain the same. In this paper, as a contribution, we present a review of data privacy essentials that are fundamental in delivering any appropriate analysis and specific methodology implementation for various data privacy needs in mobile data transactions and computation.