LGCLCRMay 28, 2023

LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers

arXiv:2305.18396v338 citations
Originality Incremental advance
AI Analysis

This work addresses privacy concerns for users in server-client LLM settings by reducing inference costs, though it is incremental as it builds on existing frameworks like Iron.

The paper tackles the problem of high overhead in private inference for transformer-based LLMs by substituting heavy operators with privacy-computing friendly approximations, achieving a 5x acceleration in computation and an 80% reduction in communication overhead with nearly identical accuracy.

The community explored to build private inference frameworks for transformer-based large language models (LLMs) in a server-client setting, where the server holds the model parameters and the client inputs its private data (or prompt) for inference. However, these frameworks impose significant overhead when the private inputs are forward propagated through the original LLMs. In this paper, we show that substituting the computation- and communication-heavy operators in the transformer architecture with privacy-computing friendly approximations can greatly reduce the private inference costs while incurring very minor impact on model performance. Compared to state-of-the-art Iron (NeurIPS 2022), our privacy-computing friendly model inference pipeline achieves a $5\times$ acceleration in computation and an 80% reduction in communication overhead, while retaining nearly identical accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes