RO AIMay 31

Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry

Zemin Yang, Yaoyu He, Yiming Zhong, Yuhao Zhang, Xinge Zhu, Yao Mu, Qingqiu Huang, Yuexin Ma

arXiv:2606.0109880.9

AI Analysis

For robot learning, IDP addresses the latency issue of iterative diffusion policies while maintaining action manifold adherence, offering a practical one-step solution for high-frequency control.

Implicit Drifting Policy (IDP) enables one-step action generation for high-frequency robot control by implicitly enforcing training-time correction without explicit vector field estimation, achieving competitive performance with strong one-step baselines across 2D, 3D, and real-world tasks.

Generative action policies based on diffusion or flow matching excel in behavior cloning, yet their iterative sampling is prohibitive for high-frequency robot control. While recent one-step formulations alleviate this latency, they inevitably discard the intermediate trajectory evolution that provides crucial action correction. Directly recovering this mechanism by explicitly estimating a training-time drifting field is mathematically ill-posed due to extreme conditional demonstration sparsity. We introduce Implicit Drifting Policy (IDP), a one-step imitation learning framework that brings the training-time correction of Drifting into policy learning without explicit vector field estimation. IDP extracts a conditional expert geometry from the local variation of observation-similar expert actions, and compares it against a global reference geometry to isolate condition-specific constraints. This local geometric structure adaptively weights a scalar potential objective. Combined with an expert-proximal terminal evaluation, IDP directly enforces manifold constraints on the one-step generator during training. Extensive evaluations across 2D, 3D, and real-world manipulation tasks show IDP effectively maintains adherence to valid action manifolds, improving upon explicit drifting methods and achieving competitive performance with strong one-step baselines.

View on arXiv PDF

Similar