CVAINIJun 27, 2024

ViT LoS V2X: Vision Transformers for Environment-aware LoS Blockage Prediction for 6G Vehicular Networks

arXiv:2407.15023v111 citations
Originality Incremental advance
AI Analysis

This addresses reliable communication for autonomous vehicles to prevent accidents, though it is an incremental improvement over existing methods.

The paper tackles the problem of predicting line-of-sight blockages in 6G vehicular networks using multimodal sensor data, achieving over 95% prediction accuracy with a deep learning approach combining CNNs, Vision Transformers, and GRUs.

As wireless communication technology progresses towards the sixth generation (6G), high-frequency millimeter-wave (mmWave) communication has emerged as a promising candidate for enabling vehicular networks. It offers high data rates and low-latency communication. However, obstacles such as buildings, trees, and other vehicles can cause signal attenuation and blockage, leading to communication failures that can result in fatal accidents or traffic congestion. Predicting blockages is crucial for ensuring reliable and efficient communications. Furthermore, the advent of 6G technology is anticipated to integrate advanced sensing capabilities, utilizing a variety of sensor types. These sensors, ranging from traditional RF sensors to cameras and Lidar sensors, are expected to provide access to rich multimodal data, thereby enriching communication systems with a wealth of additional contextual information. Leveraging this multimodal data becomes essential for making precise network management decisions, including the crucial task of blockage detection. In this paper, we propose a Deep Learning (DL)-based approach that combines Convolutional Neural Networks (CNNs) and customized Vision Transformers (ViTs) to effectively extract essential information from multimodal data and predict blockages in vehicular networks. Our method capitalizes on the synergistic strengths of CNNs and ViTs to extract features from time-series multimodal data, which include images and beam vectors. To capture temporal dependencies between the extracted features and the blockage state at future time steps, we employ a Gated Recurrent Unit (GRU)-based architecture. Our results show that the proposed approach achieves high accuracy and outperforms state-of-the-art solutions, achieving more than $95\%$ accurate predictions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes