ROMay 2
To Do or Not to Do: Ensuring the Safety of Visuomotor Policies Learned from DemonstrationsRiad Ahmed, Moniruzzaman Akash, Momotaz Begum
Task success has historically been the primary measure of policy performance in imitation learning (IL) research. This characteristics strictly limits the ubiquitous applications of IL algorithms in field robotics where safety assurance, in addition to task-success, is of paramount importance. It is often desirable for an IL-powered robot in the field not to roll out a policy, and hence score a poor performance, if the safety is not guaranteed. Although this trade-off between safety and performance is well investigated in classical control literature, policy safety is a heavily underexplored domain in IL research. There is no universal definition of safety in IL. To make things worst, many existing theoretical works on safety is notoriously difficult to extend to IL-powered robots in the field. This paper offers important insights on the safety and performance of IL policies. We propose execution guarantee, a policy-agnostic safety measure that guarantees the maximum task success for a visuomotor IL policy, despite minor run-time changes, from within a specific region in the state space. We leverage recent advances in view synthesis to identify such regions in the state space for an IL policy and explore a fundamental result on set invariance - namely, Nagumo's sub-tangentiality condition - to prove and operationalize execution guarantee from inside that region. Experiments with a Franka robot, both in simulation and real world, demonstrate how the proposed safety analysis allows various IL policies to achieve maximum task success with guarantee. We also demonstrate some interesting results on how a recovery policy - a by-product of the proposed safety analysis - can help to increase the policy performance and thereby mitigating the safety-performance tradeoff in IL.
ROMay 8
Trajectory-Consistent Flow Matching for Robust Visuomotor Policy LearningRiad Ahmed, Sujosh Nag, Moniruzzaman Akash et al.
Flow matching policies learn continuous velocity fields that transport noise to actions, enabling fast deterministic inference for robot manipulation. However, standard training optimizes a pointwise velocity objective while inference requires numerical integration of that field -- a mismatch that causes compounding trajectory errors. We propose four complementary remedies: (1) auxiliary rectified flow velocity regression that provides uniform temporal supervision across the full time interval; (2) multi-step trajectory consistency training that supervises the integrated displacement of the velocity field over trajectory segments, directly closing the train-inference gap; (3) velocity field regularization that enforces temporal smoothness, preventing oscillations that destabilize integration; and (4) fourth-order Runge-Kutta (RK4) inference that reduces global discretization error by orders of magnitude over Euler methods. Critically, these components are not independently sufficient -- RK4 without a smooth velocity field fails, and smoothness without trajectory-level supervision still drifts, as our ablation study confirms. We further pair these with a dual-view 3D point cloud encoder using two independent PointNet encoders for complementary spatial perception. On four real-robot tasks across a Franka arm and a Boston Dynamics Spot, our method achieves 70% and 60% overall success on two long-horizon multi-phase tasks where both baselines score 0%, and reaches 100% on precision tool placement. Three MetaWorld simulation tasks confirm consistent improvements, validating that trajectory-level supervision is essential for reliable policy execution.
ROMay 2
A Principled Approach for Creating High-fidelity Synthetic Demonstrations for Imitation LearningMoniruzzaman Akash, Momotaz Begum
Recent advances in 3D Gaussian Splatting (3DGS) have enabled visually realistic demonstration generation from a single expert trajectory and a short multi-view scan. However, existing 3DGS-based synthesis pipelines typically generate new motions using sampling-based planners or trajectory optimization, which often deviate substantially from the expert's demonstrated path. While such deviations may be acceptable for tasks insensitive to motion shape, they discard subtle spatial and temporal structure that is critical for contact-rich and shape-sensitive manipulation, causing increased demonstration diversity to harm downstream policy learning. We argue that demonstration synthesis should treat the expert trajectory as a strong prior. Building on this principle, we propose a framework that synthesizes diverse task demonstrations while explicitly preserving expert motion structure. We model the expert trajectory using Dynamic Movement Primitives (DMPs) and retarget it to new goals, object configurations, and viewpoints within a reconstructed 3DGS scene, yielding phase-consistent, shape-preserving motion by construction. To safely realize this expert-preserving diversity in cluttered scenes, we introduce an analytic obstacle-aware DMP formulation that operates directly on the continuous density field induced by the 3DGS representation. This enables collision avoidance while minimally perturbing the nominal expert motion, unifying photorealistic rendering and geometric reasoning without additional scene representations. We evaluate our approach on a Spot mobile manipulator across three manipulation tasks with increasing sensitivity to trajectory fidelity. Compared to planner- and optimization-based synthesis, our method produces trajectories with lower deviation and collision rates and yields higher task success when training diffusion-based visuomotor policies.