CV AI HC LGDec 24, 2025

DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors

Kaustubh Kundu, Hrishav Bakul Barua, Lucy Robertson-Bell, Zhixi Cai, Kalin Stefanov

arXiv:2512.21054v18.42 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating precise 3D sign language data for applications like communication aids, though it is incremental as it builds on prior pose estimation methods.

The authors tackled the problem of reconstructing accurate 3D hand and body poses from monocular sign language videos, which suffer from noise and occlusions, by introducing DexAvatar, a framework that uses learned priors to achieve a 35.11% improvement over state-of-the-art methods on the SGNify benchmark.

The trend in sign language generation is centered around data-driven generative methods that require vast amounts of precise 2D and 3D human pose data to achieve an acceptable generation quality. However, currently, most sign language datasets are video-based and limited to automatically reconstructed 2D human poses (i.e., keypoints) and lack accurate 3D information. Furthermore, existing state-of-the-art for automatic 3D human pose estimation from sign language videos is prone to self-occlusion, noise, and motion blur effects, resulting in poor reconstruction quality. In response to this, we introduce DexAvatar, a novel framework to reconstruct bio-mechanically accurate fine-grained hand articulations and body movements from in-the-wild monocular sign language videos, guided by learned 3D hand and body priors. DexAvatar achieves strong performance in the SGNify motion capture dataset, the only benchmark available for this task, reaching an improvement of 35.11% in the estimation of body and hand poses compared to the state-of-the-art. The official website of this work is: https://github.com/kaustesseract/DexAvatar.

View on arXiv PDF Code

Similar