LGJun 10, 2023
A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable ConvergenceKexuan Wang, An Liu, Baishuo Lin
Deep Actor-Critic algorithms, which combine Actor-Critic with deep neural network (DNN), have been among the most prevalent reinforcement learning algorithms for decision-making problems in simulated environments. However, the existing deep Actor-Critic algorithms are still not mature to solve realistic problems with non-convex stochastic constraints and high cost to interact with the environment. In this paper, we propose a single-loop deep Actor-Critic (SLDAC) algorithmic framework for general constrained reinforcement learning (CRL) problems. In the actor step, the constrained stochastic successive convex approximation (CSSCA) method is applied to handle the non-convex stochastic objective and constraints. In the critic step, the critic DNNs are only updated once or a few finite times for each iteration, which simplifies the algorithm to a single-loop framework (the existing works require a sufficient number of updates for the critic step to ensure a good enough convergence of the inner loop for each iteration). Moreover, the variance of the policy gradient estimation is reduced by reusing observations from the old policy. The single-loop design and the observation reuse effectively reduce the agent-environment interaction cost and computational complexity. In spite of the biased policy gradient estimation incurred by the single-loop design and observation reuse, we prove that the SLDAC with a feasible initial point can converge to a Karush-Kuhn-Tuker (KKT) point of the original problem almost surely. Simulations show that the SLDAC algorithm can achieve superior performance with much lower interaction cost.
NIMar 30, 2025
A Hybrid Reinforcement Learning Framework for Hard Latency Constrained Resource SchedulingLuyuan Zhang, An Liu, Kexuan Wang
In the forthcoming 6G era, extend reality (XR) has been regarded as an emerging application for ultra-reliable and low latency communications (URLLC) with new traffic characteristics and more stringent requirements. In addition to the quasi-periodical traffic in XR, burst traffic with both large frame size and random arrivals in some real world low latency communication scenarios has become the leading cause of network congestion or even collapse, and there still lacks an efficient algorithm for the resource scheduling problem under burst traffic with hard latency constraints. We propose a novel hybrid reinforcement learning framework for resource scheduling with hard latency constraints (HRL-RSHLC), which reuses polices from both old policies learned under other similar environments and domain-knowledge-based (DK) policies constructed using expert knowledge to improve the performance. The joint optimization of the policy reuse probabilities and new policy is formulated as an Markov Decision Problem (MDP), which maximizes the hard-latency constrained effective throughput (HLC-ET) of users. We prove that the proposed HRL-RSHLC can converge to KKT points with an arbitrary initial point. Simulations show that HRL-RSHLC can achieve superior performance with faster convergence speed compared to baseline algorithms.
LGJun 19, 2025
Joint User Priority and Power Scheduling for QoS-Aware WMMSE Precoding: A Constrained-Actor Attentive-Critic ApproachKexuan Wang, An Liu
6G wireless networks are expected to support diverse quality-of-service (QoS) demands while maintaining high energy efficiency. Weighted Minimum Mean Square Error (WMMSE) precoding with fixed user priorities and transmit power is widely recognized for enhancing overall system performance but lacks flexibility to adapt to user-specific QoS requirements and time-varying channel conditions. To address this, we propose a novel constrained reinforcement learning (CRL) algorithm, Constrained-Actor Attentive-Critic (CAAC), which uses a policy network to dynamically allocate user priorities and power for WMMSE precoding. Specifically, CAAC integrates a Constrained Stochastic Successive Convex Approximation (CSSCA) method to optimize the policy, enabling more effective handling of energy efficiency goals and satisfaction of stochastic non-convex QoS constraints compared to traditional and existing CRL methods. Moreover, CAAC employs lightweight attention-enhanced Q-networks to evaluate policy updates without prior environment model knowledge. The network architecture not only enhances representational capacity but also boosts learning efficiency. Simulation results show that CAAC outperforms baselines in both energy efficiency and QoS satisfaction.
LGJun 19, 2025
A Lightweight RL-Driven Deep Unfolding Network for Robust WMMSE Precoding in Massive MU-MIMO-OFDM SystemsKexuan Wang, An Liu
Weighted Minimum Mean Square Error (WMMSE) precoding is widely recognized for its near-optimal weighted sum rate performance. However, its practical deployment in massive multi-user (MU) multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems is hindered by the assumption of perfect channel state information (CSI) and high computational complexity. To address these issues, we first develop a wideband stochastic WMMSE (SWMMSE) algorithm that iteratively maximizes the ergodic weighted sum-rate (EWSR) under imperfect CSI. Building on this, we propose a lightweight reinforcement learning (RL)-driven deep unfolding (DU) network (RLDDU-Net), where each SWMMSE iteration is mapped to a network layer. Specifically, its DU module integrates approximation techniques and leverages beam-domain sparsity as well as frequency-domain subcarrier correlation, significantly accelerating convergence and reducing computational overhead. Furthermore, the RL module adaptively adjusts the network depth and generates compensation matrices to mitigate approximation errors. Simulation results under imperfect CSI demonstrate that RLDDU-Net outperforms existing baselines in EWSR performance while offering superior computational and convergence efficiency.
SYMar 12, 2025
Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data TrafficKexuan Wang, An Liu
In XR downlink transmission, energy-efficient power scheduling (EEPS) is essential for conserving power resource while delivering large data packets within hard-latency constraints. Traditional constrained reinforcement learning (CRL) algorithms show promise in EEPS but still struggle with non-convex stochastic constraints, non-stationary data traffic, and sparse delayed packet dropout feedback (rewards) in XR. To overcome these challenges, this paper models the EEPS in XR as a dynamic parameter-constrained Markov decision process (DP-CMDP) with a varying transition function linked to the non-stationary data traffic and solves it by a proposed context-aware constrained reinforcement learning (CACRL) algorithm, which consists of a context inference (CI) module and a CRL module. The CI module trains an encoder and multiple potential networks to characterize the current transition function and reshape the packet dropout rewards according to the context, transforming the original DP-CMDP into a general CMDP with immediate dense rewards. The CRL module employs a policy network to make EEPS decisions under this CMDP and optimizes the policy using a constrained stochastic successive convex approximation (CSSCA) method, which is better suited for non-convex stochastic constraints. Finally, theoretical analyses provide deep insights into the CADAC algorithm, while extensive simulations demonstrate that it outperforms advanced baselines in both power conservation and satisfying packet dropout constraints.