Robust Beam Codebooks for mmWave/THz Systems: Toward a Stochastic RL Approach
This addresses the need for robust beamforming in practical mmWave/THz systems, offering an incremental improvement over existing RL methods by focusing on resilience to impairments.
The paper tackles the problem of suboptimal beamforming codebooks in mmWave/THz massive MIMO systems under NLoS conditions and hardware constraints by introducing a robust multi-agent RL framework that learns codebooks from environmental feedback, with simulations showing SAC outperforms deterministic methods, achieving superior beamforming gains and stability.
Millimeter-wave (mmWave) and terahertz (THz) massive MIMO systems often rely on predefined beamforming codebooks, which are usually suboptimal in Non-Line-of-Sight (NLoS) conditions and for hardware-limited transceivers. Reinforcement Learning (RL) enables adaptive, data-driven codebook design without explicit Channel State Information (CSI), but the robustness of such algorithms in practical conditions is underexplored. This paper introduces a robust multi-agent RL framework that learns beam codebooks directly from environmental feedback, eliminating the need for prior channel knowledge. Our method is well-suited for real-world deployments facing unpredictable propagation and hardware constraints. We conduct a comprehensive analysis of three off-policy algorithms, Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC), evaluating their resilience to hardware impairments and feedback noise. Simulations show that SAC consistently outperforms deterministic methods, achieving superior beamforming gains and stability in NLoS scenarios, even under severe impairments. These results demonstrate the promise of RL-based codebook design for robust mmWave/THz massive MIMO systems.