Semi-supervised 3D Hand-Object Pose Estimation via Pose Dictionary Learning
This work addresses the data collection bottleneck for researchers and practitioners in 3D hand-object pose estimation, offering a semi-supervised approach that is incremental over existing methods.
The paper tackled the problem of expensive 3D labeling for hand-object pose estimation by proposing a semi-supervised method with pose dictionary learning and an object-oriented coordinate system, reducing estimation error by 19.5% for hands and 24.9% for objects on the FPHA dataset compared to using only labeled data.
3D hand-object pose estimation is an important issue to understand the interaction between human and environment. Current hand-object pose estimation methods require detailed 3D labels, which are expensive and labor-intensive. To tackle the problem of data collection, we propose a semi-supervised 3D hand-object pose estimation method with two key techniques: pose dictionary learning and an object-oriented coordinate system. The proposed pose dictionary learning module can distinguish infeasible poses by reconstruction error, enabling unlabeled data to provide supervision signals. The proposed object-oriented coordinate system can make 3D estimations equivariant to the camera perspective. Experiments are conducted on FPHA and HO-3D datasets. Our method reduces estimation error by 19.5% / 24.9% for hands/objects compared to straightforward use of labeled data on FPHA and outperforms several baseline methods. Extensive experiments also validate the robustness of the proposed method.