CVMar 26

Shared Representation for 3D Pose Estimation, Action Classification, and Progress Prediction from Tactile Signals

arXiv:2603.2590621.5h-index: 16
AI Analysis

First unified tactile-based approach for three human-robot interaction tasks, addressing occlusion and privacy limitations of vision methods.

SCOTTI learns a shared representation from tactile signals to simultaneously perform 3D pose estimation, action classification, and progress prediction, outperforming separate task-specific models on a new 7-hour dataset from 15 participants across 8 activities.

Estimating human pose, classifying actions, and predicting movement progress are essential for human-robot interaction. While vision-based methods suffer from occlusion and privacy concerns in realistic environments, tactile sensing avoids these issues. However, prior tactile-based approaches handle each task separately, leading to suboptimal performance. In this study, we propose a Shared COnvolutional Transformer for Tactile Inference (SCOTTI) that learns a shared representation to simultaneously address three separate prediction tasks: 3D human pose estimation, action class categorization, and action completion progress estimation. To the best of our knowledge, this is the first work to explore action progress prediction using foot tactile signals from custom wireless insole sensors. This unified approach leverages the mutual benefits of multi-task learning, enabling the model to achieve improved performance across all three tasks compared to learning them independently. Experimental results demonstrate that SCOTTI outperforms existing approaches across all three tasks. Additionally, we introduce a novel dataset collected from 15 participants performing various activities and exercises, with 7 hours of total duration, across eight different activities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes