Zero-Incentive Dynamics: a look at reward sparsity through the lens of unrewarded subgoals
This reveals a fundamental limitation in RL for tasks with sparse rewards, which is incremental as it critiques existing assumptions.
The paper identifies that essential subgoals without direct rewards undermine reinforcement learning methods, showing state-of-the-art algorithms fail in such zero-incentive dynamics and performance depends on reward proximity.
This work re-examines the commonly held assumption that the frequency of rewards is a reliable measure of task difficulty in reinforcement learning. We identify and formalize a structural challenge that undermines the effectiveness of current policy learning methods: when essential subgoals do not directly yield rewards. We characterize such settings as exhibiting zero-incentive dynamics, where transitions critical to success remain unrewarded. We show that state-of-the-art deep subgoal-based algorithms fail to leverage these dynamics and that learning performance is highly sensitive to the temporal proximity between subgoal completion and eventual reward. These findings reveal a fundamental limitation in current approaches and point to the need for mechanisms that can infer latent task structure without relying on immediate incentives.