73.5SPMay 19
PilotWiMAE: Pilot-Native Representation Learning for Wireless ChannelsBerkay Guler, Giovanni Geraci, Hamid Jafarkhani
Channel foundation models assume access to fully observed channels, an assumption that fails in deployment. We introduce PilotWiMAE, a self-supervised framework whose encoder ingests noisy pilot observations directly and whose attention factorizes along the axis separating temporal from joint space-frequency processing, an inductive bias inspired by the physics of the problem. Pilot input shrinks the observation space by up to two orders of magnitude and also removes the unrealistic assumption of full-CSI availability while incurring lower latency. The factorized design generates robust representations by exploiting the separable channel structure and allows a pretraining mask ratio of $99\%$. We pair patch-normalized reconstruction, which captures small-scale fading structure, with an auxiliary scale loss that recovers the large-scale fading features, and use an AWGN curriculum to match pilot noise at pretraining and deployment. Pretrained solely on $3.5$\,GHz and evaluated at $28$\,GHz across in-distribution and out-of-distribution settings, PilotWiMAE's cross-frequency beam selection and channel characterization beat supervised baselines despite operating on a smaller observation space. To weaken the coupling between decoder capacity and representation quality, we further propose a decoder-centric pretraining stage following the encoder-decoder joint pretraining, which allows PilotWiMAE to demonstrate competitive channel estimation without sacrificing representation quality. To foster further work in this direction, we release the PilotWiMAE pretrained weights and training pipeline, together with CSIGen, our Sionna-based ray-tracing channel-generation tool, and the channel datasets used in this work.
LGMay 14, 2025
A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder LearningBerkay Guler, Giovanni Geraci, Hamid Jafarkhani
Current applications of self-supervised learning to wireless channel representation often borrow paradigms developed for text and image processing, without fully addressing the unique characteristics and constraints of wireless communications. To bridge this gap, we introduce ContraWiMAE, Wireless Contrastive Masked Autoencoder, a transformer-based foundation model that unifies masked reconstruction and masked contrastive learning for wireless channel representation. Our key innovation is a new wireless-inspired contrastive objective that exploits the inherent characteristics of wireless environment, including noise, fading, and partial observability, as natural augmentation. Through extensive evaluation on unseen scenarios and conditions, we demonstrate our method's effectiveness in multiple downstream tasks, including cross-frequency beam selection, line-of-sight detection, and channel estimation. ContraWiMAE exhibits superior linear separability and adaptability in diverse wireless environments, demonstrating exceptional data efficiency and competitive performance compared with supervised baselines under challenging conditions. Comparative evaluations against a state-of-the-art wireless channel foundation model confirm the superior performance and data efficiency of our approach, highlighting its potential as a powerful baseline for future research in self-supervised wireless channel representation learning. To foster further work in this direction, we release the model weights and training pipeline for ContraWiMAE.
LGMay 14, 2025
AdaFortiTran: An Adaptive Transformer Model for Robust OFDM Channel EstimationBerkay Guler, Hamid Jafarkhani
Deep learning models for channel estimation in Orthogonal Frequency Division Multiplexing (OFDM) systems often suffer from performance degradation under fast-fading channels and low-SNR scenarios. To address these limitations, we introduce the Adaptive Fortified Transformer (AdaFortiTran), a novel model specifically designed to enhance channel estimation in challenging environments. Our approach employs convolutional layers that exploit locality bias to capture strong correlations between neighboring channel elements, combined with a transformer encoder that applies the global Attention mechanism to channel patches. This approach effectively models both long-range dependencies and spectro-temporal interactions within single OFDM frames. We further augment the model's adaptability by integrating nonlinear representations of available channel statistics SNR, delay spread, and Doppler shift as priors. A residual connection is employed to merge global features from the transformer with local features from early convolutional processing, followed by final convolutional layers to refine the hierarchical channel representation. Despite its compact architecture, AdaFortiTran achieves up to 6 dB reduction in mean squared error (MSE) compared to state-of-the-art models. Tested across a wide range of Doppler shifts (200-1000 Hz), SNRs (0 to 25 dB), and delay spreads (50-300 ns), it demonstrates superior robustness in high-mobility environments.