LGSep 7, 2023

A State Representation for Diminishing Rewards

Ted Moskovitz, Samo Hromadka, Ahmed Touati, Diana Borsa, Maneesh Sahani

arXiv:2309.03710v16.64 citationsh-index: 54

Originality Incremental advance

AI Analysis

This work addresses a gap in multitask RL for sequential tasks with diminishing rewards, offering a foundational tool for both AI and behavioral studies, though it builds incrementally on existing representations.

The paper tackles the problem of policy evaluation in reinforcement learning when rewards diminish over time, introducing the λ representation (λR) as a novel state representation that generalizes the successor representation and other methods, showing its advantages in machine learning and natural behaviors like foraging.

A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function. However, in the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli. Reflecting this disjunction, in this paper we study the phenomenon of diminishing marginal utility and introduce a novel state representation, the $λ$ representation ($λ$R) which, surprisingly, is required for policy evaluation in this setting and which generalizes the SR as well as several other state representations from the literature. We establish the $λ$R's formal properties and examine its normative advantages in the context of machine learning, as well as its usefulness for studying natural behaviors, particularly foraging.

View on arXiv PDF

Similar