CVGRApr 20

Chatting about Upper-Body Expressive Human Pose and Shape Estimation

arXiv:2604.179597.61 citationsh-index: 1
Predicted impact top 75% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenge of accurate facial and hand parameter estimation in AR/VR applications, offering a novel framework that improves upon existing methods.

CoEvoer introduces a one-stage synergistic cross-dependency transformer for upper-body expressive human pose and shape estimation, achieving state-of-the-art performance on benchmarks and strong generalization to wild images.

Expressive Human Pose and Shape Estimation (EHPS) plays a crucial role in various AR/VR applications and has witnessed significant progress in recent years. However, current state-of-the-art methods still struggle with accurate parameter estimation for facial and hand regions and exhibit limited generalization to wild images. To address these challenges, we present CoEvoer, a novel one-stage synergistic cross-dependency transformer framework tailored for upper-body EHPS. CoEvoer enables explicit feature-level interaction across different body parts, allowing for mutual enhancement through contextual information exchange. Specifically, larger and more easily estimated regions such as the torso provide global semantics and positional priors to guide the estimation of finer, more complex regions like the face and hands. Conversely, the localized details captured in facial and hand regions help refine and calibrate adjacent body parts. To the best of our knowledge, CoEvoer is the first framework designed specifically for upper-body EHPS, with the goal of capturing the strong coupling and semantic dependencies among the face, hands, and torso through joint parameter regression. Extensive experiments demonstrate that CoEvoer achieves state-of-the-art performance on upper-body benchmarks and exhibits strong generalization capability even on unseen wild images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes