LGMar 10

Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning

Heng Zhang, Haddy Alchaer, Arash Ajoudani, Yu She

arXiv:2603.09331v16.0h-index: 42

Predicted impact top 58% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the challenge of sample efficiency and generalization in reinforcement learning for embodied agents, representing a novel method for a known bottleneck rather than a foundational breakthrough.

The paper tackles the problem of sparse or delayed environmental feedback in reinforcement learning by introducing Reward-Zero, an implicit reward mechanism that uses language embeddings to generate progress signals from task descriptions, resulting in faster convergence and higher success rates compared to conventional methods like PPO.

We introduce Reward-Zero, a general-purpose implicit reward mechanism that transforms natural-language task descriptions into dense, semantically grounded progress signals for reinforcement learning (RL). Reward-Zero serves as a simple yet sophisticated universal reward function that leverages language embeddings for efficient RL training. By comparing the embedding of a task specification with embeddings derived from an agent's interaction experience, Reward-Zero produces a continuous, semantically aligned sense-of-completion signal. This reward supplements sparse or delayed environmental feedback without requiring task-specific engineering. When integrated into standard RL frameworks, it accelerates exploration, stabilizes training, and enhances generalization across diverse tasks. Empirically, agents trained with Reward-Zero converge faster and achieve higher final success rates than conventional methods such as PPO with common reward-shaping baselines, successfully solving tasks that hand-designed rewards could not in some complex tasks. In addition, we develop a mini benchmark for the evaluation of completion sense during task execution via language embeddings. These results highlight the promise of language-driven implicit reward functions as a practical path toward more sample-efficient, generalizable, and scalable RL for embodied agents. Code will be released after peer review.

View on arXiv PDF

Similar