DCLGNINov 30, 2025

Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI

arXiv:2512.01039v12 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of real-time AI inference in edge computing for applications like 6G networks, though it is incremental as it builds on existing partitioning and placement techniques.

The paper tackles the problem of running large foundation models on heterogeneous edge devices with volatile resources by introducing a framework that dynamically partitions and places model layers at runtime, achieving up to 40% lower latency and 35% higher resource utilization compared to static approaches.

Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along with a representative use case in 6G multi-access edge computing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes