LGOct 17, 2023
Reinforcement learning with non-ergodic reward increments: robustness via ergodicity transformationsDominik Baumann, Erfaun Noorani, James Price et al.
Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In particular, the focus of RL is typically on the expected value of the return. The expected value is the average over the statistical ensemble of infinitely many trajectories, which can be uninformative about the performance of the average individual. For instance, when we have a heavy-tailed return distribution, the ensemble average can be dominated by rare extreme events. Consequently, optimizing the expected value can lead to policies that yield exceptionally high returns with a probability that approaches zero but almost surely result in catastrophic outcomes in single long trajectories. In this paper, we develop an algorithm that lets RL agents optimize the long-term performance of individual trajectories. The algorithm enables the agents to learn robust policies, which we show in an instructive example with a heavy-tailed return distribution and standard RL benchmarks. The key element of the algorithm is a transformation that we learn from data. This transformation turns the time series of collected returns into one for whose increments expected value and the average over a long trajectory coincide. Optimizing these increments results in robust policies.
AIAug 14, 2023
Distinguishing Risk Preferences using Repeated GamblesJames Price, Colm Connaughton
Sequences of repeated gambles provide an experimental tool to characterize the risk preferences of humans or artificial decision-making agents. The difficulty of this inference depends on factors including the details of the gambles offered and the number of iterations of the game played. In this paper we explore in detail the practical challenges of inferring risk preferences from the observed choices of artificial agents who are presented with finite sequences of repeated gambles. We are motivated by the fact that the strategy to maximize long-run wealth for sequences of repeated additive gambles (where gains and losses are independent of current wealth) is different to the strategy for repeated multiplicative gambles (where gains and losses are proportional to current wealth.) Accurate measurement of risk preferences would be needed to tell whether an agent is employing the optimal strategy or not. To generalize the types of gambles our agents face we use the Yeo-Johnson transformation, a tool borrowed from feature engineering for time series analysis, to construct a family of gambles that interpolates smoothly between the additive and multiplicative cases. We then analyze the optimal strategy for this family, both analytically and numerically. We find that it becomes increasingly difficult to distinguish the risk preferences of agents as their wealth increases. This is because agents with different risk preferences eventually make the same decisions for sufficiently high wealth. We believe that these findings are informative for the effective design of experiments to measure risk preferences in humans.