AirGlove: Exploring Egocentric 3D Hand Tracking and Appearance Generalization for Sensing Gloves
This work addresses the challenge of accurate hand tracking for sensing gloves in teleoperation and robotics, though it is incremental as it builds on existing vision-based models.
The paper tackled the problem of vision-based 3D hand tracking for gloved hands, which suffers from performance degradation due to appearance differences, and proposed AirGlove to generalize learned representations to new glove designs, achieving a significant performance boost over existing methods.
Sensing gloves have become important tools for teleoperation and robotic policy learning as they are able to provide rich signals like speed, acceleration and tactile feedback. A common approach to track gloved hands is to directly use the sensor signals (e.g., angular velocity, gravity orientation) to estimate 3D hand poses. However, sensor-based tracking can be restrictive in practice as the accuracy is often impacted by sensor signal and calibration quality. Recent advances in vision-based approaches have achieved strong performance on human hands via large-scale pre-training, but their performance on gloved hands with distinct visual appearances remains underexplored. In this work, we present the first systematic evaluation of vision-based hand tracking models on gloved hands under both zero-shot and fine-tuning setups. Our analysis shows that existing bare-hand models suffer from substantial performance degradation on sensing gloves due to large appearance gap between bare-hand and glove designs. We therefore propose AirGlove, which leverages existing gloves to generalize the learned glove representations towards new gloves with limited data. Experiments with multiple sensing gloves show that AirGlove effectively generalizes the hand pose models to new glove designs and achieves a significant performance boost over the compared schemes.