CRMar 13

Almost-Free Queue Jumping for Prior Inputs in Private Neural Inference

Qiao Zhang, Minghui Xu, Tingchuang Zhang, Xiuzhen Cheng

arXiv:2603.1294663.3

AI Analysis

This addresses the need for flexible priority management in PP-MLaaS systems, which is incremental as it builds on existing mixed-primitive frameworks.

The paper tackled the problem of inefficient priority handling in privacy-preserving neural network inference by proposing PrivQJ, a framework that enables queue jumping for urgent requests with almost no additional cryptographic cost, achieving over an order-of-magnitude reduction in overhead compared to state-of-the-art systems.

Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. NaÃ¯ve queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.

View on arXiv PDF

Similar