NIMay 13

WirelessSenseLLM: Zero-Shot Human Activity Understanding by Bridging Wireless Signals and Human Language

arXiv:2605.1407051.0
AI Analysis

For wireless sensing researchers, this work enables zero-shot, language-driven interpretation of unsegmented CSI signals, overcoming limitations of segmentation and predefined labels.

WirelessSenseLLM enables zero-shot human motion understanding from unsegmented Wi-Fi CSI by bridging signals and language via a CSI-to-Language Adapter, achieving 92% accuracy and 91% F1-score in action understanding, with 30% factual and 15% reasoning improvements over prior methods.

There is growing interest in enabling wireless sensing systems to interpret human motion from unsegmented wireless signals; however, existing CSI-based applications rely heavily on accurate signal segmentation and predefined action labels, limiting their applicability in zero-shot scenarios. We present WirelessSenseLLM, a language-driven framework that leverages large language models (LLMs) to enable zero-shot human motion understanding from unsegmented Wi-Fi Channel State Information (CSI). To bridge the modality gap between time-series CSI and discrete language representations, we introduce a CSI-to-Language Adapter and a cross-modal projection mechanism that maps CSI features into a language-aligned semantic space. This design enables the generation of fine-grained natural language descriptions of sequential and overlapping human motions, supporting downstream reasoning without segmented training data. We address two core technical challenges: modality mismatch between CSI features and language embeddings, and overlapping actions in unsegmented CSI streams. Extensive experiments demonstrate strong performance in zero-shot action understanding (92% accuracy and 91% F1-score), language-based reasoning quality (30% factual and 15% reasoning improvements), and multi-person motion explanation with an average 12.33% improvement over prior methods. These results highlight WirelessSenseLLM's effectiveness for robust and interpretable human motion understanding from CSI signals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes