Where Do Human Heuristics Come From?
This addresses the problem of understanding human decision-making heuristics for cognitive science and AI, but it is incremental as it builds on existing meta-learning and bounded rationality frameworks.
The paper tackled the problem of why human decision-making deviates from optimal solutions by proposing that humans use learned, resource-bounded approximations instead of known optimal algorithms, and found that their models replicated individual differences in human behavior in a two-armed bandit task.
Human decision-making deviates from the optimal solution, that maximizes cumulative rewards, in many situations. Here we approach this discrepancy from the perspective of bounded rationality and our goal is to provide a justification for such seemingly sub-optimal strategies. More specifically we investigate the hypothesis, that humans do not know optimal decision-making algorithms in advance, but instead employ a learned, resource-bounded approximation. The idea is formalized through combining a recently proposed meta-learning model based on Recurrent Neural Networks with a resource-bounded objective. The resulting approach is closely connected to variational inference and the Minimum Description Length principle. Empirical evidence is obtained from a two-armed bandit task. Here we observe patterns in our family of models that resemble differences between individual human participants.