MERLIN: Building Low-SNR Robust Multimodal LLMs for Electromagnetic Signals
This work provides foundational resources and a robust model for researchers and practitioners aiming to develop MLLMs for electromagnetic signal analysis, particularly in challenging low-SNR conditions. It is a significant step for the EM domain.
This paper addresses the challenges of applying Multimodal Large Language Models (MLLMs) to the electromagnetic (EM) domain, specifically focusing on data scarcity, lack of benchmarks, and fragility in low Signal-to-Noise Ratio (SNR) environments. The authors introduce EM-100k, a dataset of over 100,000 EM signal-text pairs, and EM-Bench, a comprehensive benchmark for EM signal-to-text tasks. They also propose MERLIN, a novel training framework that achieves state-of-the-art performance on EM-Bench and demonstrates remarkable robustness in low-SNR settings.
The paradigm of Multimodal Large Language Models (MLLMs) offers a promising blueprint for advancing the electromagnetic (EM) domain. However, prevailing approaches often deviate from the native MLLM paradigm, instead using task-specific or pipelined architectures that lead to fundamental limitations in model performance and generalization. Fully realizing the MLLM potential in EM domain requires overcoming three main challenges: (1) Data. The scarcity of high-quality datasets with paired EM signals and descriptive text annotations used for MLLMs pre-training; (2) Benchmark. The absence of comprehensive benchmarks to systematically evaluate and compare the performance of models on EM signal-to-text tasks; (3) Model. A critical fragility in low Signal-to-Noise Ratio (SNR) environments, where critical signal features can be obscured, leading to significant performance degradation. To address these challenges, we introduce a tripartite contribution to establish a foundation for MLLMs in the EM domain. First, to overcome data scarcity, we construct and release EM-100k, a large-scale dataset comprising over 100,000 EM signal-text pairs. Second, to enable rigorous and standardized evaluation, we propose EM-Bench, the most comprehensive benchmark featuring diverse downstream tasks spanning from perception to reasoning. Finally, to tackle the core modeling challenge, we present MERLIN, a novel training framework designed not only to align low-level signal representations with high-level semantic text, but also to explicitly enhance model robustness and performance in challenging low-SNR environments. Comprehensive experiments validate our method, showing that MERLIN is state-of-the-art in the EM-Bench and exhibits remarkable robustness in low-SNR settings.