CV AI LGOct 7, 2025

Bimanual 3D Hand Motion and Articulation Forecasting in Everyday Images

Aditya Prakash, David Forsyth, Saurabh Gupta

arXiv:2510.06145v16.21 citationsh-index: 6

Originality Incremental advance

AI Analysis

This addresses the problem of predicting realistic hand interactions in everyday settings for robotics and AR/VR applications, with incremental improvements in data handling and modeling.

The paper tackles forecasting bimanual 3D hand motion and articulation from single everyday images, achieving a 14% improvement from diverse training data, with lifting and forecasting models showing 42% and 16.4% gains respectively over baselines.

We tackle the problem of forecasting bimanual 3D hand motion & articulation from a single image in everyday settings. To address the lack of 3D hand annotations in diverse settings, we design an annotation pipeline consisting of a diffusion model to lift 2D hand keypoint sequences to 4D hand motion. For the forecasting model, we adopt a diffusion loss to account for the multimodality in hand motion distribution. Extensive experiments across 6 datasets show the benefits of training on diverse data with imputed labels (14% improvement) and effectiveness of our lifting (42% better) & forecasting (16.4% gain) models, over the best baselines, especially in zero-shot generalization to everyday images.

View on arXiv PDF

Similar