VST-Pose: A Velocity-Integrated Spatiotem-poral Attention Network for Human WiFi Pose Estimation
This provides a privacy-aware solution for continuous human motion analysis in smart home care scenarios, though it is incremental with a novel method for a known bottleneck.
The paper tackles WiFi-based human pose estimation by introducing VST-Pose, a deep learning framework that integrates velocity modeling to capture subtle motions, achieving 92.2% accuracy on PCK@50 and outperforming existing methods by 8.3% on a self-collected dataset.
WiFi-based human pose estimation has emerged as a promising non-visual alternative approaches due to its pene-trability and privacy advantages. This paper presents VST-Pose, a novel deep learning framework for accurate and continuous pose estimation using WiFi channel state information. The proposed method introduces ViSTA-Former, a spatiotemporal attention backbone with dual-stream architecture that adopts a dual-stream architecture to separately capture temporal dependencies and structural relationships among body joints. To enhance sensitivity to subtle human motions, a velocity modeling branch is integrated into the framework, which learns short-term keypoint dis-placement patterns and improves fine-grained motion representation. We construct a 2D pose dataset specifically designed for smart home care scenarios and demonstrate that our method achieves 92.2% accuracy on the PCK@50 metric, outperforming existing methods by 8.3% in PCK@50 on the self-collected dataset. Further evaluation on the public MMFi dataset confirms the model's robustness and effectiveness in 3D pose estimation tasks. The proposed system provides a reliable and privacy-aware solution for continuous human motion analysis in indoor environments. Our codes are available in https://github.com/CarmenQing/VST-Pose.