DCLGDec 16, 2024

Priority-Aware Model-Distributed Inference at Edge Networks

arXiv:2412.12371v13 citationsh-index: 202025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)
Originality Incremental advance
AI Analysis

This work addresses efficient inference scheduling for edge networks with prioritized data sources, representing an incremental improvement over existing model-distributed methods.

The paper tackles the problem of model-distributed inference with multiple data sources having different priorities, formulating an optimization for model allocation and designing a PA-MDI algorithm that reduces inference time compared to baselines in experiments with edge devices and models like ResNet-50 and GPT-2.

Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire Machine Learning (ML) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of ML layers. In MDI, a source device that has data processes a few layers of ML model and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI when multiple data sources co-exist. We consider that each data source has a different importance and, hence, a priority. We formulate and solve a priority-aware model allocation optimization problem. Based on the structure of the optimal solution, we design a practical Priority-Aware Model- Distributed Inference (PA-MDI) algorithm that determines model allocation and distribution over devices by taking into account the priorities of different sources. Experiments were conducted on a real-life testbed of NVIDIA Jetson Xavier and Nano edge devices as well as in the Colosseum testbed with ResNet-50, ResNet- 56, and GPT-2 models. The experimental results show that PA-MDI performs priority-aware model allocation successfully while reducing the inference time as compared to baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes