GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs
This work addresses the computational bottleneck of matrix multiplication in FHE-based DNNs for practitioners needing efficient encrypted inference.
The authors propose a GPU-accelerated sparse matrix multiplication method for fully homomorphic encrypted deep neural networks, achieving up to 3.0× speedup over CPU and reducing time complexity from cubic to semi-linear.
Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE presents a promising opportunity for progress, with applications ranging from machine learning to information security. We target the most computationally intensive operation in deep neural networks from a hardware perspective, matrix multiplication (matmul), and adapt it for execution on AMD GPUs. We propose a new optimized method that improves the runtime and complexity of ciphertext matmul by using FIDESlib, a recent open-source FHE library designed specifically for GPUs. By exploiting sparsity in both operands, our sparse matmul implementation outperforms its CPU counterpart by up to $3.0\times$ and reduces the time complexity from cubic to semi-linear, demonstrating an improvement over existing FHE matmul implementations.