LG AI GT GNApr 1, 2025

Modelling bounded rational decision-making through Wasserstein constraints

Benjamin Patrick Evans, Leo Ardon, Sumitra Ganesh

arXiv:2504.03743v27.12 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in reinforcement learning for modeling agent behavior with ordinal actions, offering an incremental improvement over prior methods.

The paper tackled the problem of modeling bounded rational decision-making in reinforcement learning by proposing a Wasserstein distance-based approach to overcome limitations of existing entropy, KL-divergence, and mutual information methods, particularly for ordinal action spaces, resulting in a method that accounts for action nearness, supports low-probability actions, and is computationally simple.

Modelling bounded rational decision-making through information constrained processing provides a principled approach for representing departures from rationality within a reinforcement learning framework, while still treating decision-making as an optimization process. However, existing approaches are generally based on Entropy, Kullback-Leibler divergence, or Mutual Information. In this work, we highlight issues with these approaches when dealing with ordinal action spaces. Specifically, entropy assumes uniform prior beliefs, missing the impact of a priori biases on decision-makings. KL-Divergence addresses this, however, has no notion of "nearness" of actions, and additionally, has several well known potentially undesirable properties such as the lack of symmetry, and furthermore, requires the distributions to have the same support (e.g. positive probability for all actions). Mutual information is often difficult to estimate. Here, we propose an alternative approach for modeling bounded rational RL agents utilising Wasserstein distances. This approach overcomes the aforementioned issues. Crucially, this approach accounts for the nearness of ordinal actions, modeling "stickiness" in agent decisions and unlikeliness of rapidly switching to far away actions, while also supporting low probability actions, zero-support prior distributions, and is simple to calculate directly.

View on arXiv PDF

Similar