CVLGMar 10, 2023

Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

arXiv:2303.05691v18 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses privacy and adverse vision conditions in pose estimation, but is incremental as it builds on tactile sensing and transformer methods.

The paper tackles the problem of human pose estimation from noisy and ambiguous pressure recordings by proposing a spatio-temporal masked transformer, achieving improved performance over existing solutions on two public datasets.

Despite the impressive performance of vision-based pose estimators, they generally fail to perform well under adverse vision conditions and often don't satisfy the privacy demands of customers. As a result, researchers have begun to study tactile sensing systems as an alternative. However, these systems suffer from noisy and ambiguous recordings. To tackle this problem, we propose a novel solution for pose estimation from ambiguous pressure data. Our method comprises a spatio-temporal vision transformer with an encoder-decoder architecture. Detailed experiments on two popular public datasets reveal that our model outperforms existing solutions in the area. Moreover, we observe that increasing the number of temporal crops in the early stages of the network positively impacts the performance while pre-training the network in a self-supervised setting using a masked auto-encoder approach also further improves the results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes