NIAIFeb 17

High-Fidelity Network Management for Federated AI-as-a-Service: Cross-Domain Orchestration

arXiv:2602.15281v2h-index: 10
Originality Incremental advance
AI Analysis

This addresses the operational challenge for communication service providers (CSPs) in managing AIaaS with reliable end-to-end delivery, though it appears incremental by building on existing stochastic network calculus methods.

The paper tackles the problem of ensuring high-fidelity AI-as-a-Service (AIaaS) across federated multi-domain networks by introducing an assurance-oriented management plane based on Tail-Risk Envelopes (TREs), which improves p99.9 compliance under overload through admission control and tenant isolation.

To support the emergence of AI-as-a-Service (AIaaS), communication service providers (CSPs) are on the verge of a radical transformation-from pure connectivity providers to AIaaS a managed network service (control-and-orchestration plane that exposes AI models). In this model, the CSP is responsible not only for transport/communications, but also for intent-to-model resolution and joint network-compute orchestration, i.e., reliable and timely end-to-end delivery. The resulting end-to-end AIaaS service thus becomes governed by communications impairments (delay, loss) and inference impairments (latency, error). A central open problem is an operational AIaaS control-and-orchestration framework that enforces high fidelity, particularly under multi-domain federation. This paper introduces an assurance-oriented AIaaS management plane based on Tail-Risk Envelopes (TREs): signed, composable per-domain descriptors that combine deterministic guardrails with stochastic rate-latency-impairment models. Using stochastic network calculus, we derive bounds on end-to-end delay violation probabilities across tandem domains and obtain an optimization-ready risk-budget decomposition. We show that tenant-level reservations prevent bursty traffic from inflating tail latency under TRE contracts. An auditing layer then uses runtime telemetry to estimate extreme-percentile performance, quantify uncertainty, and attribute tail-risk to each domain for accountability. Packet-level Monte-Carlo simulations demonstrate improved p99.9 compliance under overload via admission control and robust tenant isolation under correlated burstiness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes