LGMay 26

Adversarial Dual On-Policy Distillation from Expressive Flow-based Teacher

arXiv:2605.2709582.0
Predicted impact top 14% in LG · last 90 daysOriginality Highly original
AI Analysis

For embodied control tasks requiring learning from demonstrations, FA-OPD addresses the distribution mismatch problem in behavioral cloning by combining reward and action distillation, outperforming existing methods.

FA-OPD introduces adversarial dual on-policy distillation from a flow-matching teacher to improve behavioral cloning in embodied control, achieving state-of-the-art performance across six robot benchmarks with significantly stronger robustness under noisy or limited demonstrations.

Learning from demonstrations in embodied control is often cast as behavioral cloning, and recent diffusion or flow-matching policies improve this paradigm by modeling multi-modal expert actions. Yet these methods remain offline supervised learners: the policy is trained only on expert states and receives no corrective signal on the states it actually visits. On-policy distillation (OPD) offers a natural remedy, but standard OPD assumes a strong fixed teacher, which is unavailable in demonstration-only control. We propose \textbf{FA-OPD}, an \emph{adversarial dual on-policy distillation} method in which a Flow Matching (FM) teacher is learned from demonstrations and co-trained with a lightweight MLP student. The teacher provides two complementary signals on student rollouts. The reward channel learns an expert-likeness objective over state-action pairs and drives online exploration through long-horizon policy optimization. The action channel supplies dense local targets at student-visited states, stabilizing exploitation. FA-OPD couples them so that reward distillation enables generalization beyond point-wise demonstrations, while action distillation keeps exploration anchored near expert-like behavior. Across six robot navigation, manipulation, and locomotion benchmarks, FA-OPD beats strong baselines and shows much stronger robustness under noisy or limited demonstrations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes