LG AI CR NIFeb 22, 2025

Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services

Zhipeng Cheng, Xiaoyu Xia, Hong Wang, Minghui Liwang, Ning Chen, Xuwei Fan, Xianbin Wang

arXiv:2502.16091v36 citationsh-index: 21IEEE Trans Serv Comput

Originality Incremental advance

AI Analysis

This work addresses privacy and efficiency issues in edge inference services for applications like IoT and mobile computing, though it is incremental as it builds on existing optimization and game theory methods.

The paper tackles the challenge of deploying DNN models on resource-constrained edge devices by proposing a privacy-aware optimization framework that jointly addresses model deployment, user-server association, and model partitioning, resulting in significantly reduced inference delay while consistently meeting privacy constraints, as demonstrated through extensive simulations outperforming state-of-the-art baselines.

Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To tackle these issues, this paper presents a novel privacy-aware optimization framework that jointly addresses DNN model deployment, user-server association, and model partitioning, with the goal of minimizing long-term average inference delay under resource and privacy constraints. The problem is formulated as a complex, NP-hard stochastic optimization. To efficiently handle system dynamics and computational complexity, we employ a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Furthermore, we introduce a coalition formation game to enable adaptive user-server association and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay and consistently satisfies privacy constraints, outperforming state-of-the-art baselines across diverse scenarios.

View on arXiv PDF

Similar