10.1NIMay 24
K8S Power Irrigation: Deep Reinforcement Learning for Performance-Aware Power Efficiency of Kubernetes Cloud-Native MicroservicesZouhir Bellal, Laaziz Lahlou, Nadjia Kara et al.
Modern cloud platforms are facing a sharp increase in power demand driven by the rapid adoption of AI-powered applications, making power optimization urgent under net-zero commitments and sustainability goals. Yet, reducing power in production remains challenging for latency-sensitive microservices, where performance violations directly affect user experience and operational risk. Such services exhibit heterogeneous workload characteristics and dynamic load patterns. In multi-tenant environments, contention on shared uncore resources, including last-level cache and memory bandwidth, can degrade performance, especially for memory-intensive workloads. As a safeguard, providers often run servers in performance mode, fixing core and uncore frequencies at high levels. Existing power governors largely ignore application-level performance requirements and uncore interference, leading to systematic power over-provisioning. To address this, we introduce K8SPI, a hierarchical reinforcement learning controller that jointly optimizes CPU core and uncore frequencies for cloud-native deployments. K8SPI uses a two-stage architecture: a coarse-grained agent rapidly mitigates performance violations, while a fine-grained agent minimizes power once requirements are satisfied. Using telemetry from hardware, Kubernetes, and application layers, K8SPI adapts to workload heterogeneity and cross-microservice interference. We evaluate K8SPI on a Kubernetes testbed across multiple scenarios. Results show that K8SPI reduces node-level power by 23--30\% compared with the Linux performance governor while keeping performance requirement violations below 2--3\%, even under severe uncore contention and dynamic load fluctuations.
SEOct 18, 2023
Telecom AI Native Systems in the Age of Generative AI -- An Engineering PerspectiveRicardo Britto, Timothy Murphy, Massimo Iovene et al.
The rapid advancements in Artificial Intelligence (AI), particularly in generative AI and foundational models (FMs), have ushered in transformative changes across various industries. Large language models (LLMs), a type of FM, have demonstrated their prowess in natural language processing tasks and content generation, revolutionizing how we interact with software products and services. This article explores the integration of FMs in the telecommunications industry, shedding light on the concept of AI native telco, where AI is seamlessly woven into the fabric of telecom products. It delves into the engineering considerations and unique challenges associated with implementing FMs into the software life cycle, emphasizing the need for AI native-first approaches. Despite the enormous potential of FMs, ethical, regulatory, and operational challenges require careful consideration, especially in mission-critical telecom contexts. As the telecom industry seeks to harness the power of AI, a comprehensive understanding of these challenges is vital to thrive in a fiercely competitive market.
NIMar 6, 2025
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital ExperiencesAdnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi et al.
This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.