LGFeb 3
Adversarial construction as a potential solution to the experiment design problem in large task spacesPrakhar Godara, Frederick Callaway, Marcelo G. Mattar
Despite decades of work, we still lack a robust, task-general theory of human behavior even in the simplest domains. In this paper we tackle the generality problem head-on, by aiming to develop a unified model for all tasks embedded in a task-space. In particular we consider the space of binary sequence prediction tasks where the observations are generated by the space parameterized by hidden Markov models (HMM). As the space of tasks is large, experimental exploration of the entire space is infeasible. To solve this problem we propose the adversarial construction approach, which helps identify tasks that are most likely to elicit a qualitatively novel behavior. Our results suggest that adversarial construction significantly outperforms random sampling of environments and therefore could be used as a proxy for optimal experimental design in high-dimensional task spaces.
AIAug 2, 2024
Metareasoning in uncertain environments: a meta-BAMDP frameworkPrakhar Godara, Tilman Diego Alemán
\textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to Bernoulli bandit tasks. Owing to the meta problem's complexity, our solutions are necessarily approximate. However, we introduce two novel theorems that significantly enhance the tractability of the problem, enabling stronger approximations that are robust within a range of assumptions grounded in realistic human decision-making scenarios. These results offer a resource-rational perspective and a normative framework for understanding human exploration under cognitive constraints, as well as providing experimentally testable predictions about human behavior in Bernoulli Bandit tasks.
AIMay 12, 2025
Bias or Optimality? Disentangling Bayesian Inference and Learning Biases in Human Decision-MakingPrakhar Godara
Recent studies claim that human behavior in a two-armed Bernoulli bandit (TABB) task is described by positivity and confirmation biases, implying that humans do not integrate new information objectively. However, we find that even if the agent updates its belief via objective Bayesian inference, fitting the standard Q-learning model with asymmetric learning rates still recovers both biases. Bayesian inference cast as an effective Q-learning algorithm has symmetric, though decreasing, learning rates. We explain this by analyzing the stochastic dynamics of these learning systems using master equations. We find that both confirmation bias and unbiased but decreasing learning rates yield the same behavioral signatures. Finally, we propose experimental protocols to disentangle true cognitive biases from artifacts of decreasing learning rates.