LG AIApr 23

Dynamical Priors as a Training Objective in Reinforcement Learning

arXiv:2604.2146426.0

Predicted impact top 77% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

For RL practitioners, this work shows that training objectives alone can control temporal decision geometry, but the results are limited to minimal environments and incremental.

Standard RL policies can exhibit temporally incoherent behavior. DP-RL augments policy gradient with an auxiliary loss from external state dynamics (evidence accumulation, hysteresis) to shape temporal evolution of action probabilities, promoting structured behavior across three minimal environments.

Standard reinforcement learning (RL) optimizes policies for reward but imposes few constraints on how decisions evolve over time. As a result, policies may achieve high performance while exhibiting temporally incoherent behavior such as abrupt confidence shifts, oscillations, or degenerate inactivity. We introduce Dynamical Prior Reinforcement Learning (DP-RL), a training framework that augments policy gradient learning with an auxiliary loss derived from external state dynamics that implement evidence accumulation and hysteresis. Without modifying the reward, environment, or policy architecture, this prior shapes the temporal evolution of action probabilities during learning. Across three minimal environments, we show that dynamical priors systematically alter decision trajectories in task-dependent ways, promoting temporally structured behavior that cannot be explained by generic smoothing. These results demonstrate that training objectives alone can control the temporal geometry of decision-making in RL agents.

View on arXiv PDF

Similar