Pablo Soldati

LG
h-index11
6papers
15citations
Novelty35%
AI Score37

6 Papers

OCJul 1, 2014
Modular design of jointly optimal controllers and forwarding policies for wireless control

Burak Demirel, Zhenhua Zou, Pablo Soldati et al.

We consider the joint design of packet forwarding policies and controllers for wireless control loops where sensor measurements are sent to the controller over an unreliable and energy-constrained multi-hop wireless network. For fixed sampling rate of the sensor, the co-design problem separates into two well-defined and independent subproblems: transmission scheduling for maximizing the deadline-constrained reliability and optimal control under packet loss. We develop optimal and implementable solutions for these subproblems and show that the optimally co-designed system can be efficiently found. Numerical examples highlight the many trade-offs involved and demonstrate the power of our approach.

LGJun 9, 2023
Design Principles for Model Generalization and Scalable AI Integration in Radio Access Networks

Pablo Soldati, Euhanna Ghadimi, Burak Demirel et al.

Artificial intelligence (AI) has emerged as a powerful tool for addressing complex and dynamic tasks in radio communication systems. Research in this area, however, focused on AI solutions for specific, limited conditions, hindering models from learning and adapting to generic situations, such as those met across radio communication systems. This paper emphasizes the pivotal role of achieving model generalization in enhancing performance and enabling scalable AI integration within radio communications. We outline design principles for model generalization in three key domains: environment for robustness, intents for adaptability to system objectives, and control tasks for reducing AI-driven control loops. Implementing these principles can decrease the number of models deployed and increase adaptability in diverse radio communication environments. To address the challenges of model generalization in communication systems, we propose a learning architecture that leverages centralization of training and data management functionalities, combined with distributed data generation. We illustrate these concepts by designing a generalized link adaptation algorithm, demonstrating the benefits of our proposed approach.

LGNov 9, 2025
Practical Policy Distillation for Reinforcement Learning in Radio Access Networks

Sara Khosravi, Burak Demirel, Linghui Zhou et al.

Adopting artificial intelligence (AI) in radio access networks (RANs) presents several challenges, including limited availability of link-level measurements (e.g., CQI reports), stringent real-time processing constraints (e.g., sub-1 ms per TTI), and network heterogeneity (different spectrum bands, cell types, and vendor equipment). A critical yet often overlooked barrier lies in the computational and memory limitations of RAN baseband hardware, particularly in legacy 4th Generation (4G) systems, which typically lack on-chip neural accelerators. As a result, only lightweight AI models (under 1 Mb and sub-100~μs inference time) can be effectively deployed, limiting both their performance and applicability. However, achieving strong generalization across diverse network conditions often requires large-scale models with substantial resource demands. To address this trade-off, this paper investigates policy distillation in the context of a reinforcement learning-based link adaptation task. We explore two strategies: single-policy distillation, where a scenario-agnostic teacher model is compressed into one generalized student model; and multi-policy distillation, where multiple scenario-specific teachers are consolidated into a single generalist student. Experimental evaluations in a high-fidelity, 5th Generation (5G)-compliant simulator demonstrate that both strategies produce compact student models that preserve the teachers' generalization capabilities while complying with the computational and memory limitations of existing RAN hardware.

LGOct 30, 2024
Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation

Samuele Peri, Alessio Russo, Gabor Fodor et al.

Link adaptation (LA) is an essential function in modern wireless communication systems that dynamically adjusts the transmission rate of a communication link to match time- and frequency-varying radio link conditions. However, factors such as user mobility, fast fading, imperfect channel quality information, and aging of measurements make the modeling of LA challenging. To bypass the need for explicit modeling, recent research has introduced online reinforcement learning (RL) approaches as an alternative to the more commonly used rule-based algorithms. Yet, RL-based approaches face deployment challenges, as training in live networks can potentially degrade real-time performance. To address this challenge, this paper considers offline RL as a candidate to learn LA policies with minimal effects on the network operation. We propose three LA designs based on batch-constrained deep Q-learning, conservative Q-learning, and decision transformer. Our results show that offline RL algorithms can match the performance of state-of-the-art online RL methods when data is collected with a proper behavioral policy.

LGJul 9, 2025
Generalization in Reinforcement Learning for Radio Access Networks

Burak Demirel, Yu Wang, Cristian Tatino et al.

Modern RAN operate in highly dynamic and heterogeneous environments, where hand-tuned, rule-based RRM algorithms often underperform. While RL can surpass such heuristics in constrained settings, the diversity of deployments and unpredictable radio conditions introduce major generalization challenges. Data-driven policies frequently overfit to training conditions, degrading performance in unseen scenarios. To address this, we propose a generalization-centered RL framework for RAN control that: (i) robustly reconstructs dynamically varying states from partial and noisy observations, while encoding static and semi-static information, such as radio nodes, cell attributes, and their topology, through graph representations; (ii) applies domain randomization to broaden the training distribution; and (iii) distributes data generation across multiple actors while centralizing training in a cloud-compatible architecture aligned with O-RAN principles. Although generalization increases computational and data-management complexity, our distributed design mitigates this by scaling data collection and training across diverse network conditions. Applied to downlink link adaptation in five 5G benchmarks, our policy improves average throughput and spectral efficiency by ~10% over an OLLA baseline (10% BLER target) in full-buffer MIMO/mMIMO and by >20% under high mobility. It matches specialized RL in full-buffer traffic and achieves up to 4- and 2-fold gains in eMBB and mixed-traffic benchmarks, respectively. In nine-cell deployments, GAT models offer 30% higher throughput over MLP baselines. These results, combined with our scalable architecture, offer a path toward AI-native 6G RAN using a single, generalizable RL agent.

LGFeb 1
From Intents to Actions: Agentic AI in Autonomous Networks

Burak Demirel, Pablo Soldati, Yu Wang

Telecommunication networks are increasingly expected to operate autonomously while supporting heterogeneous services with diverse and often conflicting intents -- that is, performance objectives, constraints, and requirements specific to each service. However, transforming high-level intents -- such as ultra-low latency, high throughput, or energy efficiency -- into concrete control actions (i.e., low-level actuator commands) remains beyond the capability of existing heuristic approaches. This work introduces an Agentic AI system for intent-driven autonomous networks, structured around three specialized agents. A supervisory interpreter agent, powered by language models, performs both lexical parsing of intents into executable optimization templates and cognitive refinement based on feedback, constraint feasibility, and evolving network conditions. An optimizer agent converts these templates into tractable optimization problems, analyzes trade-offs, and derives preferences across objectives. Lastly, a preference-driven controller agent, based on multi-objective reinforcement learning, leverages these preferences to operate near the Pareto frontier of network performance that best satisfies the original intent. Collectively, these agents enable networks to autonomously interpret, reason over, adapt to, and act upon diverse intents and network conditions in a scalable manner.