CVROOct 8, 2025

WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation

arXiv:2510.07313v113 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses a domain-specific bottleneck in robotics by enabling better manipulation through generated wrist views, though it is incremental as it builds on existing visual geometry models.

The paper tackles the problem of generating wrist-view videos for robotic manipulation from anchor views, which are scarce in datasets, by proposing WristWorld, a 4D world model that improves VLA performance, increasing task completion length by 3.81% and closing 42.4% of the view gap.

Wrist-view observations are crucial for VLA models as they capture fine-grained hand-object interactions that directly enhance manipulation performance. Yet large-scale datasets rarely include such recordings, resulting in a substantial gap between abundant anchor views and scarce wrist views. Existing world models cannot bridge this gap, as they require a wrist-view first frame and thus fail to generate wrist-view videos from anchor views alone. Amid this gap, recent visual geometry models such as VGGT emerge with geometric and cross-view priors that make it possible to address extreme viewpoint shifts. Inspired by these insights, we propose WristWorld, the first 4D world model that generates wrist-view videos solely from anchor views. WristWorld operates in two stages: (i) Reconstruction, which extends VGGT and incorporates our Spatial Projection Consistency (SPC) Loss to estimate geometrically consistent wrist-view poses and 4D point clouds; (ii) Generation, which employs our video generation model to synthesize temporally coherent wrist-view videos from the reconstructed perspective. Experiments on Droid, Calvin, and Franka Panda demonstrate state-of-the-art video generation with superior spatial consistency, while also improving VLA performance, raising the average task completion length on Calvin by 3.81% and closing 42.4% of the anchor-wrist view gap.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes