LG CRNov 2, 2022

MPCFormer: fast, performant and private Transformer inference with MPC

Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

arXiv:2211.01452v227.6138 citationsh-index: 46Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient and private inference in cloud-based Transformer services, representing a strong specific gain rather than a foundational advancement.

The paper tackles the problem of slow and quality-compromising private inference for Transformer models in cloud services by proposing MPCFormer, a framework using Secure Multi-Party Computation and Knowledge Distillation. It achieves similar performance to BERTBASE on IMDb with a 5.3x speedup and 97% performance on GLUE with a 2.2x speedup.

Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillation (KD). Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model. On the IMDb dataset, it achieves similar performance to BERTBASE, while being 5.3x faster. On the GLUE benchmark, it achieves 97% performance of BERTBASE with a 2.2x speedup. MPCFORMER remains effective with different trained Transformer weights such as ROBERTABASE and larger models including BERTLarge. Code is available at https://github.com/MccRee177/MPCFormer.

View on arXiv PDF Code

Similar