Mohammad Reza Javan

SP
3papers
109citations
Novelty48%
AI Score24

3 Papers

SPJul 13, 2021
Learning based E2E Energy Efficient in Joint Radio and NFV Resource Allocation for 5G and Beyond Networks

Narges Gholipoor, Ali Nouruzi, Shima Salarhosseini et al.

In this paper, we propose a joint radio and core resource allocation framework for NFV-enabled networks. In the proposed system model, the goal is to maximize energy efficiency (EE), by guaranteeing end-to-end (E2E) quality of service (QoS) for different service types. To this end, we formulate an optimization problem in which power and spectrum resources are allocated in the radio part. In the core part, the chaining, placement, and scheduling of functions are performed to ensure the QoS of all users. This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the available resources and wireless channels. A soft actor-critic deep reinforcement learning (SAC-DRL) algorithm based on the maximum entropy framework is subsequently utilized to solve the above MDP. Numerical results reveal that the proposed joint approach based on the SAC-DRL algorithm could significantly reduce energy consumption compared to the case in which R-RA and NFV-RA problems are optimized separately.

SPMay 10, 2021
AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning

Mohammad Parvini, Mohammad Reza Javan, Nader Mokari et al.

This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers while ensuring timely delivery of safety-critical messages to the Road-Side Unit (RSU). Due to the challenges of dynamic channel conditions, centralized resource management schemes that require global information are inefficient and lead to large signaling overheads. Hence, we exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy. Existing MARL algorithms consider a holistic reward function for the group's collective success, which often ends up with unsatisfactory results and cannot guarantee an optimal policy for each agent. Consequently, motivated by the existing literature in RL, we propose a novel MARL framework that trains two critics with the following goals: A global critic which estimates the global expected reward and motivates the agents toward a cooperating behavior and an exclusive local critic for each agent that estimates the local individual reward. Furthermore, based on the tasks each agent has to accomplish, the individual reward of each agent is decomposed into multiple sub-reward functions where task-wise value functions are learned separately. Numerical results indicate our proposed algorithm's effectiveness compared with the conventional RL methods applied in this area.

SPFeb 2, 2021
Proactive and AoI-aware Failure Recovery for Stateful NFV-enabled Zero-Touch 6G Networks: Model-Free DRL Approach

Amirhossein Shaghaghi, Abolfazl Zakeri, Nader Mokari et al.

In this paper, we propose a Zero-Touch, deep reinforcement learning (DRL)-based Proactive Failure Recovery framework called ZT-PFR for stateful network function virtualization (NFV)-enabled networks. To this end, we formulate a resource-efficient optimization problem minimizing the network cost function including resource cost and wrong decision penalty. As a solution, we propose state-of-the-art DRL-based methods such as soft-actor-critic (SAC) and proximal-policy-optimization (PPO). In addition, to train and test our DRL agents, we propose a novel impending-failure model. Moreover, to keep network status information at an acceptable freshness level for appropriate decision-making, we apply the concept of age of information to strike a balance between the event and scheduling based monitoring. Several key systems and DRL algorithm design insights for ZT-PFR are drawn from our analysis and simulation results. For example, we use a hybrid neural network, consisting long short-term memory layers in the DRL agents structure, to capture impending-failures time dependency.