LG AIApr 24

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

arXiv:2604.2305629.4Has Code

Predicted impact top 87% in LG · last 90 daysOriginality Synthesis-oriented

AI Analysis

For RL practitioners, this offers a principled, low-overhead alternative to heuristic reward normalization, though tested only on simple environments.

The paper introduces K-Score, a Kalman filter-based method for online reward estimation that replaces standard reward normalization in policy gradient RL, achieving faster convergence and reduced training variance on LunarLander and CartPole.

We propose a simple yet effective alternative to reward normalization in policy gradient reinforcement learning by integrating a 1D Kalman filter for online reward estimation. Instead of relying on fixed heuristics, our method recursively estimates the latent reward mean, smoothing high-variance returns and adapting to non-stationary environments. This approach incurs minimal overhead and requires no modification to existing policy architectures. Experiments on \textit{LunarLander} and \textit{CartPole} demonstrate that Kalman-filtered rewards significantly accelerate convergence and reduce training variance compared to standard normalization techniques. Code is available at https://github.com/Sumxiaa/Kalman_Normalization.

View on arXiv PDF Code

Similar