CVROMar 26

LaMP: Learning Vision-Language-Action Policies with 3D Scene Flow as Latent Motion Prior

arXiv:2603.2539991.11 citationsh-index: 15
AI Analysis

This work addresses limitations in vision-language-action models for robotics by enhancing spatial dynamics learning, representing an incremental improvement with specific gains in simulation and real-world tasks.

The paper tackles the problem of robotic manipulation by introducing LaMP, a dual-expert framework that uses 3D scene flow as a latent motion prior to improve action prediction, achieving the highest average success rates on benchmarks and a 9.7% gain in robustness on out-of-distribution perturbations.

We introduce \textbf{LaMP}, a dual-expert Vision-Language-Action framework that embeds dense 3D scene flow as a latent motion prior for robotic manipulation. Existing VLA models regress actions directly from 2D semantic visual features, forcing them to learn complex 3D physical interactions implicitly. This implicit learning strategy degrades under unfamiliar spatial dynamics. LaMP addresses this limitation by aligning a flow-matching \emph{Motion Expert} with a policy-predicting \emph{Action Expert} through gated cross-attention. Specifically, the Motion Expert generates a one-step partially denoised 3D scene flow, and its hidden states condition the Action Expert without full multi-step reconstruction. We evaluate LaMP on the LIBERO, LIBERO-Plus, and SimplerEnv-WidowX simulation benchmarks as well as real-world experiments. LaMP consistently outperforms evaluated VLA baselines across LIBERO, LIBERO-Plus, and SimplerEnv-WidowX benchmarks, achieving the highest reported average success rates under the same training budgets. On LIBERO-Plus OOD perturbations, LaMP shows improved robustness with an average 9.7% gain over the strongest prior baseline. Our project page is available at https://summerwxk.github.io/lamp-project-page/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes