CLMay 20, 2025
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless CommunicationsXin Li, Mengbing Liu, Li Wei et al.
Large Language Models (LLMs) have achieved impressive results across a broad array of tasks, yet their capacity for complex, domain-specific mathematical reasoning-particularly in wireless communications-remains underexplored. In this work, we introduce WirelessMathBench, a novel benchmark specifically designed to evaluate LLMs on mathematical modeling challenges to wireless communications engineering. Our benchmark consists of 587 meticulously curated questions sourced from 40 state-of-the-art research papers, encompassing a diverse spectrum of tasks ranging from basic multiple-choice questions to complex equation completion tasks, including both partial and full completions, all of which rigorously adhere to physical and dimensional constraints. Through extensive experimentation with leading LLMs, we observe that while many models excel in basic recall tasks, their performance degrades significantly when reconstructing partially or fully obscured equations, exposing fundamental limitations in current LLMs. Even DeepSeek-R1, the best performer on our benchmark, achieves an average accuracy of only 38.05%, with a mere 7.83% success rate in full equation completion. By publicly releasing WirelessMathBench along with the evaluation toolkit, we aim to advance the development of more robust, domain-aware LLMs for wireless system analysis and broader engineering applications.
SPMar 10, 2025
Onboard Terrain Classification via Stacked Intelligent Metasurface-Diffractive Deep Neural Networks from SAR Level-0 Raw DataMengbing Liu, Xin Li, Jiancheng An et al.
This paper introduces a novel approach for real-time onboard terrain classification from Sentinel-1 (S1) level-0 raw In-phase/Quadrature (IQ) data, leveraging a Stacked Intelligent Metasurface (SIM) to perform inference directly in the analog wave domain. Unlike conventional digital deep neural networks, the proposed multi-layer Diffractive Deep Neural Network (D$^2$NN) setup implements automatic feature extraction as electromagnetic waves propagate through stacked metasurface layers. This design not only reduces reliance on expensive downlink bandwidth and high-power computing at terrestrial stations but also achieves performance levels around 90\% directly from the real raw IQ data, in terms of accuracy, precision, recall, and F1 Score. Our method therefore helps bridge the gap between next-generation remote sensing tasks and in-orbit processing needs, paving the way for computationally efficient remote sensing applications.
LGSep 27, 2025
WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement LearningXin Li, Mengbing Liu, Yiyang Zhu et al.
Large language models (LLMs) excel at general mathematical reasoning but fail catastrophically on specialized technical mathematics. In wireless communications, where problems require precise manipulation of information-theoretic bounds, optimization constraints, and signal processing formulations, even state-of-the-art models struggle to achieve competent performance. We present WirelessMathLM, demonstrating that compact models (0.5B-7B parameters) can match or exceed much larger models through domain-specific reinforcement learning with verifiable rewards. Our key insight is that wireless mathematics problems possess a unique property--verifiable correctness--that enables effective reinforcement learning without human feedback. We construct WirelessMathBench-XL, a comprehensive benchmark of 4,027 problems from 970 papers. Using Group Relative Policy Optimization (GRPO) with binary verification rewards, we train models directly from base checkpoints without supervised warm-start. Our 7B model achieves 39.5% accuracy on WirelessMathBench-XL, approaching GPT-4o (40.4%) while using about 100 times fewer parameters than DeepSeek-R1 (671B, 57.4%). Remarkably, GRPO training nearly doubles performance across all model scales (0.5B +11%, 3B +103%, 7B +81%), with positive transfer to general mathematics benchmarks--our models gain +8.4 points on average across MATH, Minerva-Math, OlympiadBench, AMC, and AIME without any training on these tasks.