CRAIAug 25, 2023

Falcon: Accelerating Homomorphically Encrypted Convolutions for Efficient Private Mobile Network Inference

arXiv:2308.13189v113 citationsh-index: 99
Originality Incremental advance
AI Analysis

This work addresses the inefficiency in private mobile network inference for users requiring secure computation, representing an incremental improvement over existing methods.

The paper tackles the high inference overhead in homomorphically encrypted two-party computation frameworks for efficient mobile networks by proposing Falcon, a dense packing algorithm that reduces latency by up to 15.6x at the operator level and improves accuracy by up to 4.2% at the network level.

Efficient networks, e.g., MobileNetV2, EfficientNet, etc, achieves state-of-the-art (SOTA) accuracy with lightweight computation. However, existing homomorphic encryption (HE)-based two-party computation (2PC) frameworks are not optimized for these networks and suffer from a high inference overhead. We observe the inefficiency mainly comes from the packing algorithm, which ignores the computation characteristics and the communication bottleneck of homomorphically encrypted depthwise convolutions. Therefore, in this paper, we propose Falcon, an effective dense packing algorithm for HE-based 2PC frameworks. Falcon features a zero-aware greedy packing algorithm and a communication-aware operator tiling strategy to improve the packing density for depthwise convolutions. Compared to SOTA HE-based 2PC frameworks, e.g., CrypTFlow2, Iron and Cheetah, Falcon achieves more than 15.6x, 5.1x and 1.8x latency reduction, respectively, at operator level. Meanwhile, at network level, Falcon allows for 1.4% and 4.2% accuracy improvement over Cheetah on CIFAR-100 and TinyImagenet datasets with iso-communication, respecitvely.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes