An Error Bound for Aggregation in Approximate Dynamic Programming
Provides a theoretical guarantee for a broader class of aggregation methods used in reinforcement learning, benefiting researchers and practitioners who rely on these approximations.
The paper derives an error bound for aggregation in approximate dynamic programming, extending a previous result from hard aggregation to soft and feature-based aggregation schemes.
We consider a general aggregation framework for discounted finite-state infinite horizon dynamic programming (DP) problems. It defines an aggregate problem whose optimal cost function can be obtained off-line by exact DP and then used as a terminal cost approximation for an on-line reinforcement learning (RL) scheme. We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem. This bound was first derived by Tsitsiklis and van Roy [TvR96] for the special case of hard aggregation. Our bound is similar but applies far more broadly, including to soft aggregation and feature-based aggregation schemes.