CVApr 25, 2021

Parallel mesh reconstruction streams for pose estimation of interacting hands

arXiv:2104.12123v11.4

Originality Incremental advance

AI Analysis

This work addresses the challenging problem of accurate hand pose estimation in interactive scenarios for applications like human-computer interaction and robotics, though it is incremental with a novel network architecture.

The authors tackled 3D hand pose estimation from single RGB images by introducing a multi-stream mesh reconstruction network that shares global and local information across parallel decoding paths, achieving superior performance on hand-object and hand-hand interaction datasets compared to existing methods.

We present a new multi-stream 3D mesh reconstruction network (MSMR-Net) for hand pose estimation from a single RGB image. Our model consists of an image encoder followed by a mesh-convolution decoder composed of connected graph convolution layers. In contrast to previous models that form a single mesh decoding path, our decoder network incorporates multiple cross-resolution trajectories that are executed in parallel. Thus, global and local information are shared to form rich decoding representations at minor additional parameter cost compared to the single trajectory network. We demonstrate the effectiveness of our method in hand-hand and hand-object interaction scenarios at various levels of interaction. To evaluate the former scenario, we propose a method to generate RGB images of closely interacting hands. Moreoever, we suggest a metric to quantify the degree of interaction and show that close hand interactions are particularly challenging. Experimental results show that the MSMR-Net outperforms existing algorithms on the hand-object FreiHAND dataset as well as on our own hand-hand dataset.

View on arXiv PDF

Similar