CV SPAug 27, 2025

Enhancing Automatic Modulation Recognition With a Reconstruction-Driven Vision Transformer Under Limited Labels

Hossein Ahmadi, Banafsheh Saffari, Sajjad Emdadi Mahdimahalleh, Mohammad Esmaeil Safari, Aria Ahmadi

arXiv:2508.20193v22 citationsh-index: 1

Originality Incremental advance

AI Analysis

This provides a label-efficient solution for cognitive radio and wireless communication, though it is incremental as it builds on existing ViT and self-supervised methods.

The paper tackles the problem of automatic modulation recognition with limited labeled data by proposing a unified Vision Transformer framework that integrates supervised, self-supervised, and reconstruction objectives, achieving performance approaching ResNet-level accuracy with only 15-20% labeled data on the RML2018.01A dataset.

Automatic modulation recognition (AMR) is critical for cognitive radio, spectrum monitoring, and secure wireless communication. However, existing solutions often rely on large labeled datasets or multi-stage training pipelines, which limit scalability and generalization in practice. We propose a unified Vision Transformer (ViT) framework that integrates supervised, self-supervised, and reconstruction objectives. The model combines a ViT encoder, a lightweight convolutional decoder, and a linear classifier; the reconstruction branch maps augmented signals back to their originals, anchoring the encoder to fine-grained I/Q structure. This strategy promotes robust, discriminative feature learning during pretraining, while partial label supervision in fine-tuning enables effective classification with limited labels. On the RML2018.01A dataset, our approach outperforms supervised CNN and ViT baselines in low-label regimes, approaches ResNet-level accuracy with only 15-20% labeled data, and maintains strong performance across varying SNR levels. Overall, the framework provides a simple, generalizable, and label-efficient solution for AMR.

View on arXiv PDF

Similar