69.3CRApr 7Code
LanG -- A Governance-Aware Agentic AI Platform for Unified Security OperationsAnes Abdennebi, Nadjia Kara, Laaziz Lahlou et al.
Modern Security Operations Centers struggle with alert fatigue, fragmented tooling, and limited cross-source event correlation. Challenges that current Security Information Event Management and Extended Detection and Response systems only partially address through fragmented tools. This paper presents the LLM-assisted network Governance (LanG), an open-source, governance-aware agentic AI platform for unified security operations contributing: (i) a Unified Incident Context Record with a correlation engine (F1 = 87%), (ii) an Agentic AI Orchestrator on LangGraph with human-in-the-loop checkpoints, (iii) an LLM-based Rule Generator finetuned on four base models producing deployable Snort 2/3, Suricata, and YARA rules (average acceptance rate 96.2%), (iv) a Three-Phase Attack Reconstructor combining Louvain community detection, LLM-driven hypothesis generation, and Bayesian scoring (87.5% kill-chain accuracy), and (v) a layered Governance-MCP-Agentic AI-Security architecture where all tools are exposed via the Model Context Protocol, governed by an AI Governance Policy Engine with a two-layer guardrail pipeline (regex + Llama Prompt Guard 2 semantic classifier, achieving 98.1% F1 score with experimental zero false positives). Designed for Managed Security Service Providers, the platform supports multi-tenant isolation, role-based access, and fully local deployment. Finetuned anomaly and threat detectors achieve weighted F1 scores of 99.0% and 91.0%, respectively, in intrusion-detection benchmarks, running inferences in $\approx$21 ms with a machine-side mean time to detect of 1.58 s, and the rule generator exceeds 91% deployability on live IDS engines. A systematic comparison against eight SOC platforms confirms that LanG uniquely satisfies multiple industrial capabilities all in one open-source tool, while enforcing selected AI governance policies.
10.1NIMay 24
K8S Power Irrigation: Deep Reinforcement Learning for Performance-Aware Power Efficiency of Kubernetes Cloud-Native MicroservicesZouhir Bellal, Laaziz Lahlou, Nadjia Kara et al.
Modern cloud platforms are facing a sharp increase in power demand driven by the rapid adoption of AI-powered applications, making power optimization urgent under net-zero commitments and sustainability goals. Yet, reducing power in production remains challenging for latency-sensitive microservices, where performance violations directly affect user experience and operational risk. Such services exhibit heterogeneous workload characteristics and dynamic load patterns. In multi-tenant environments, contention on shared uncore resources, including last-level cache and memory bandwidth, can degrade performance, especially for memory-intensive workloads. As a safeguard, providers often run servers in performance mode, fixing core and uncore frequencies at high levels. Existing power governors largely ignore application-level performance requirements and uncore interference, leading to systematic power over-provisioning. To address this, we introduce K8SPI, a hierarchical reinforcement learning controller that jointly optimizes CPU core and uncore frequencies for cloud-native deployments. K8SPI uses a two-stage architecture: a coarse-grained agent rapidly mitigates performance violations, while a fine-grained agent minimizes power once requirements are satisfied. Using telemetry from hardware, Kubernetes, and application layers, K8SPI adapts to workload heterogeneity and cross-microservice interference. We evaluate K8SPI on a Kubernetes testbed across multiple scenarios. Results show that K8SPI reduces node-level power by 23--30\% compared with the Linux performance governor while keeping performance requirement violations below 2--3\%, even under severe uncore contention and dynamic load fluctuations.
50.9CRApr 14
Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inferenceAnes Abdennebi, Nadjia Kara, Laaziz Lahlou
The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low latency. However, this synergy raises serious concerns regarding the security of large language models (LLMs) and their potential impact on the privacy of companies and users' data. Many technology companies that incorporate LLMs in their services with a certain level of command and control bear a risk of data exposure and secret divulgence caused by insecure LLM pipelines, making them vulnerable to multiple attacks such as data poisoning, prompt injection, and model theft. Although several security techniques (input/output sanitization, decentralized learning, access control management, and encryption) were implemented to reduce this risk, there is still an imminent risk of quantum computing attacks, which are expected to break existing encryption algorithms, hence, retrieving secret keys, encrypted sensitive data, and decrypting encrypted models. In this extensive work, we integrate the Post-Quantum Cryptography (PQC) based Lattice-based Homomorphic Encryption (HE) main functions in the LLM's inference pipeline to secure some of its layers against data privacy attacks. We modify the inference pipeline of the transformer architecture for the LLAMA-3 model while injecting the main homomorphic encryption operations provided by the concrete-ml library. We demonstrate high text generation accuracies (up to 98%) with reasonable latencies (237 ms) on an i9 CPU, reaching up to 80 tokens per second, which proves the feasibility and validity of our work while running a FHE-secured LLAMA-3 inference model. Further experiments and analysis are discussed to justify models' text generation latencies and behaviours.