CRAICLLGJul 3, 2024

ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets

arXiv:2407.02960v25 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses a privacy and security challenge for entities using proprietary LLMs on sensitive data, though it is an incremental improvement combining existing techniques.

The paper tackles the problem of performing fine-tuning and inference of proprietary large language models on private datasets while preserving the confidentiality of both the model and data, achieving this through an obfuscation technique that uses confidential computing with only 5% of model parameters in a trusted execution environment.

This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of ObfuscaTune by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a naïve version of our approach to highlight the necessity of using random matrices with low condition numbers in our approach to reduce errors induced by the obfuscation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes