NIApr 28

Design Insights into Partition Placement and Routing for DNN Inference in Multi-Hop Edge Networks

arXiv:2604.255714.6
AI Analysis

This work provides practical insights for deploying latency-sensitive DNN inference in heterogeneous edge networks, though the method is incremental and specific to fixed-partition models.

The paper addresses the coupled problem of partition placement and routing for DNN inference in multi-hop edge networks, proposing a congestion-aware alternating framework. Numerical evaluations show that split flexibility is crucial in IoT-edge-cloud settings and congestion-aware refinement becomes more beneficial under higher load.

Partitioned DNN inference is a promising approach for latency-sensitive intelligent services in edge networks, since it allows different parts of a model to be executed across end devices, edge servers, and the cloud. However, in a multi-hop edge network, partition placement and inference traffic routing are inherently coupled: raw inputs, intermediate features, and final outputs may have very different sizes, while candidate nodes also differ in computation capability. In addition, both communication and computation delays can become congestion-dependent under load. In this paper, we study joint partition placement and routing for fixed-partition DNN inference over heterogeneous multi-hop edge networks. We consider a small number of DNN partitions, each placed at exactly one node without replication, and formulate a congestion-aware mixed discrete--continuous optimization problem that captures both routing and execution costs. To solve it, we develop a practical alternating framework that couples partition placement with congestion-aware forwarding updates. Through numerical evaluation on hierarchical, regular, synthetic irregular, and real backbone-inspired topologies, we show that split flexibility is particularly important in IoT--edge--cloud settings, while congestion-aware refinement becomes increasingly beneficial as the offered load grows. We further illustrate how the preferred operating point depends on the communication--computation tradeoff.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes