Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function
This work addresses efficiency issues in reinforcement learning for researchers and practitioners by offering a theoretically grounded satisficing approach, though it is incremental as it builds on existing bandit problem frameworks.
The paper tackles the challenge of solving complex reinforcement learning tasks within practical time frames by proposing a satisficing strategy that seeks actions above an aspiration level rather than optimal ones, proving that their risk-sensitive satisficing model guarantees finding such actions and achieves finite regret in K-armed bandit problems.
As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing} strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing ($RS$) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the $K$-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that $RS$ is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of $RS$ is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of $RS$ with that of other representative algorithms for the $K$-armed bandit problems.