Wenyuan Xie

RO
h-index9
3papers
1citation
Novelty62%
AI Score46

3 Papers

ROMay 1Code
Recovering Hidden Reward in Diffusion-Based Policies

Yanbiao Ji, Qiuchang Li, Yuting Hu et al.

This paper introduces EnergyFlow, a framework that unifies generative action modeling with inverse reinforcement learning by parameterizing a scalar energy function whose gradient is the denoising field. We establish that under maximum-entropy optimality, the score function learned via denoising score matching recovers the gradient of the expert's soft Q-function, enabling reward extraction without adversarial training. Formally, we prove that constraining the learned field to be conservative reduces hypothesis complexity and tightens out-of-distribution generalization bounds. We further characterize the identifiability of recovered rewards and bound how score estimation errors propagate to action preferences. Empirically, EnergyFlow achieves state-of-the-art imitation performance on various manipulation tasks while providing an effective reward signal for downstream reinforcement learning that outperforms both adversarial IRL methods and likelihood-based alternatives. These results show that the structural constraints required for valid reward extraction simultaneously serve as beneficial inductive biases for policy generalization. The code is available at https://github.com/sotaagi/EnergyFlow.

ROOct 11, 2025
Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback

Shaokai Wu, Yanbiao Ji, Qiuchang Li et al.

Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire new useful knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework called Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN automatically identifies contextually successful prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards on EFN to ensure that the predicted actions align with past successful behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience" despite fixed weights. Experiments across diverse embodied tasks show that EFN significantly improves adaptability, robustness, and success rates over frozen baselines. These results highlight a promising path toward embodied agents that continually refine their behavior after deployment.

LGJun 13, 2024
How Out-of-Distribution Detection Learning Theory Enhances Transformer: Learnability and Reliability

Yijin Zhou, Yutang Ge, Wenyuan Xie et al.

Transformers excel in natural language processing and computer vision tasks. However, they still face challenges in generalizing to Out-of-Distribution (OOD) datasets, i.e. data whose distribution differs from that seen during training. OOD detection aims to distinguish outliers while preserving in-distribution (ID) data performance. This paper introduces the OOD detection Probably Approximately Correct (PAC) Theory for transformers, which establishes the conditions for data distribution and model configurations for the OOD detection learnability of transformers. It shows that outliers can be accurately represented and distinguished with sufficient data under conditions. The theoretical implications highlight the trade-off between theoretical principles and practical training paradigms. By examining this trade-off, we naturally derived the rationale for leveraging auxiliary outliers to enhance OOD detection. Our theory suggests that by penalizing the misclassification of outliers within the loss function and strategically generating soft synthetic outliers, one can robustly bolster the reliability of transformer networks. This approach yields a novel algorithm that ensures learnability and refines the decision boundaries between inliers and outliers. In practice, the algorithm consistently achieves state-of-the-art (SOTA) performance across various data formats.