CRMar 13

Almost-Free Queue Jumping for Prior Inputs in Private Neural Inference

arXiv:2603.1294663.3
AI Analysis

This addresses the need for flexible priority management in PP-MLaaS systems, which is incremental as it builds on existing mixed-primitive frameworks.

The paper tackled the problem of inefficient priority handling in privacy-preserving neural network inference by proposing PrivQJ, a framework that enables queue jumping for urgent requests with almost no additional cryptographic cost, achieving over an order-of-magnitude reduction in overhead compared to state-of-the-art systems.

Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. Naïve queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes