CVSPMay 28, 2025

MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism

arXiv:2505.22555v2IEEE Internet of Things Journal
Originality Incremental advance
AI Analysis

This addresses the challenge of non-intrusive human activity monitoring for applications like healthcare or surveillance, but it appears incremental as it builds on existing CSI-based methods with specific improvements.

The paper tackles the problem of accurate multi-person pose estimation using Channel State Information (CSI) by proposing MultiFormer, a system that achieves higher accuracy over state-of-the-art approaches, particularly for high-mobility keypoints like wrists and elbows.

Human pose estimation based on Channel State Information (CSI) has emerged as a promising approach for non-intrusive and precise human activity monitoring, yet faces challenges including accurate multi-person pose recognition and effective CSI feature learning. This paper presents MultiFormer, a wireless sensing system that accurately estimates human pose through CSI. The proposed system adopts a Transformer based time-frequency dual-token feature extractor with multi-head self-attention. This feature extractor is able to model inter-subcarrier correlations and temporal dependencies of the CSI. The extracted CSI features and the pose probability heatmaps are then fused by Multi-Stage Feature Fusion Network (MSFN) to enforce the anatomical constraints. Extensive experiments conducted on on the public MM-Fi dataset and our self-collected dataset show that the MultiFormer achieves higher accuracy over state-of-the-art approaches, especially for high-mobility keypoints (wrists, elbows) that are particularly difficult for previous methods to accurately estimate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes