ROAICVSep 25, 2025

Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations

arXiv:2509.20703v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable robot learning from video demonstrations for manipulation tasks, though it is incremental by extending flow matching to SE(3) for probabilistic modeling.

The paper tackles the problem of generating feasible robot motions from human video demonstrations by proposing the Joint Flow Trajectory Optimization framework, which balances grasp pose selection, object trajectory imitation, and collision avoidance, and validates it in simulation and real-world tasks with improved performance metrics.

Learning from human video demonstrations offers a scalable alternative to teleoperation or kinesthetic teaching, but poses challenges for robot manipulators due to embodiment differences and joint feasibility constraints. We address this problem by proposing the Joint Flow Trajectory Optimization (JFTO) framework for grasp pose generation and object trajectory imitation under the video-based Learning-from-Demonstration (LfD) paradigm. Rather than directly imitating human hand motions, our method treats demonstrations as object-centric guides, balancing three objectives: (i) selecting a feasible grasp pose, (ii) generating object trajectories consistent with demonstrated motions, and (iii) ensuring collision-free execution within robot kinematics. To capture the multimodal nature of demonstrations, we extend flow matching to $\SE(3)$ for probabilistic modeling of object trajectories, enabling density-aware imitation that avoids mode collapse. The resulting optimization integrates grasp similarity, trajectory likelihood, and collision penalties into a unified differentiable objective. We validate our approach in both simulation and real-world experiments across diverse real-world manipulation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes