CRLGFeb 4, 2022

Selective Network Linearization for Efficient Private Inference

arXiv:2202.02340v255 citationsHas Code
AI Analysis

This work addresses the efficiency bottleneck in private inference, enabling faster and more accurate secure data processing for privacy-sensitive applications.

The paper tackles the high latency of private inference (PI) by selectively linearizing ReLU activations, achieving up to 4.25% higher accuracy at iso-ReLU count or 2.2x lower latency at iso-accuracy compared to state-of-the-art methods.

Private inference (PI) enables inference directly on cryptographically secure data.While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25\%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70\%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a "no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy. Public code is available at \url{https://github.com/NYU-DICE-Lab/selective_network_linearization}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes