Structure-Aware Multimodal LLM Framework for Trustworthy Near-Field Beam Prediction
This addresses beam alignment challenges in 3D low-altitude environments for wireless communication systems, representing a novel method for a known bottleneck.
The paper tackles the inefficiency of conventional beam training in near-field XL-MIMO systems by proposing an LLM-driven multimodal framework that fuses GPS, RGB, LiDAR, and textual prompts to learn spatial dynamics, achieving superior environmental comprehension for beam prediction.
In near-field extremely large-scale multiple-input multiple-output (XL-MIMO) systems, spherical wavefront propagation expands the traditional beam codebook into the joint angular-distance domain, rendering conventional beam training prohibitively inefficient, especially in complex 3-dimensional (3D) low-altitude environments. Furthermore, since near-field beam variations are deeply coupled not only with user positions but also with the physical surroundings, precise beam alignment demands profound environmental understanding capabilities. To address this, we propose a large language model (LLM)-driven multimodal framework that fuses historical GPS data, RGB image, LiDAR data, and strategically designed task-specific textual prompts. By utilizing the powerful emergent reasoning and generalization capabilities of the LLM, our approach learns complex spatial dynamics to achieve superior environmental comprehension...