CVAISPDec 27, 2023

Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

arXiv:2401.05412v14 citationsh-index: 9AAAI
Originality Incremental advance
AI Analysis

This work addresses the problem of ambiguous motion reconstruction for applications using wearable devices, representing an incremental advance with novel integration of text supervision.

The paper tackles the ambiguity in 3D human motion reconstruction from sparse IMU data by introducing a method that uses textual semantics to supervise sensor weighting and alignment, achieving significant improvements in metrics and enabling differentiation between ambiguous actions like sitting and standing.

Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes