Markov Decision Processes with Time-Varying Geometric Discounting
This work addresses a theoretical gap in MDP modeling for applications requiring time-varying discounting, but it is incremental as it extends existing game-theoretic frameworks to this specific case.
The paper tackles the problem of modeling Markov decision processes with time-varying discount factors, which is necessary for certain applications, by taking a game-theoretic approach and proving the existence of subgame perfect equilibrium (SPE) and demonstrating EXPTIME-hardness for computing it. It also shows that an approximate ε-SPE exists under milder assumptions and presents an algorithm with a time complexity bound based on the discount factor's convergence property.
Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of $ε$-SPE and show that an $ε$-SPE exists under milder assumptions. An algorithm is presented to compute an $ε$-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.