RO AIMar 10

From Flow to One Step: Real-Time Multi-Modal Trajectory Policies via Implicit Maximum Likelihood Estimation-based Distribution Distillation

Ju Dong, Liding Zhang, Lei Zhang, Yu Fu, Kaixin Bai, Zoltan-Csaba Marton, Zhenshan Bing, Zhaopeng Chen, Alois Christian Knoll, Jianwei Zhang

arXiv:2603.09415v110.7h-index: 67

Predicted impact top 15% in RO · last 90 daysOriginality Incremental advance

AI Analysis

This enables high-frequency closed-loop robotic manipulation, addressing a bottleneck for real-time applications, though it is incremental as it builds on existing distillation methods.

The paper tackles the problem of high latency in generative policies for robotic manipulation by distilling a multi-modal Conditional Flow Matching expert into a fast single-step student using Implicit Maximum Likelihood Estimation, achieving real-time control with preserved mode coverage and fidelity.

Generative policies based on diffusion and flow matching achieve strong performance in robotic manipulation by modeling multi-modal human demonstrations. However, their reliance on iterative Ordinary Differential Equation (ODE) integration introduces substantial latency, limiting high-frequency closed-loop control. Recent single-step acceleration methods alleviate this overhead but often exhibit distributional collapse, producing averaged trajectories that fail to execute coherent manipulation strategies. We propose a framework that distills a Conditional Flow Matching (CFM) expert into a fast single-step student via Implicit Maximum Likelihood Estimation (IMLE). A bi-directional Chamfer distance provides a set-level objective that promotes both mode coverage and fidelity, enabling preservation of the teacher multi-modal action distribution in a single forward pass. A unified perception encoder further integrates multi-view RGB, depth, point clouds, and proprioception into a geometry-aware representation. The resulting high-frequency control supports real-time receding-horizon re-planning and improved robustness under dynamic disturbances.

View on arXiv PDF

Similar