ROMar 31

Efficient Camera Pose Augmentation for View Generalization in Robotic Policy Learning

arXiv:2603.2919296.4h-index: 19
AI Analysis

This addresses the view generalization problem in robotic policy learning, representing a novel method for a known bottleneck rather than a paradigm shift.

The paper tackles the problem of poor novel view generalization in 2D-centric visuomotor policies for robotics by introducing GenSplat, a feed-forward 3D Gaussian Splatting framework that reconstructs 3D scenes from sparse inputs and renders synthetic views to augment training data, resulting in robust policy execution under severe spatial perturbations where baselines severely degrade.

Prevailing 2D-centric visuomotor policies exhibit a pronounced deficiency in novel view generalization, as their reliance on static observations hinders consistent action mapping across unseen views. In response, we introduce GenSplat, a feed-forward 3D Gaussian Splatting framework that facilitates view-generalized policy learning through novel view rendering. GenSplat employs a permutation-equivariant architecture to reconstruct high-fidelity 3D scenes from sparse, uncalibrated inputs in a single forward pass. To ensure structural integrity, we design a 3D-prior distillation strategy that regularizes the 3DGS optimization, preventing the geometric collapse typical of purely photometric supervision. By rendering diverse synthetic views from these stable 3D representations, we systematically augment the observational manifold during training. This augmentation forces the policy to ground its decisions in underlying 3D structures, thereby ensuring robust execution under severe spatial perturbations where baselines severely degrade.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes