Does Calibration Affect Human Actions?
This addresses the problem of improving human reliance on AI predictions for non-experts, but it is incremental as it builds on existing calibration and behavioral economics concepts.
The study investigated how calibrating a machine learning classifier affects non-expert human decisions, finding that calibration alone is insufficient, and a prospect theory correction is crucial for increasing the correlation between human decisions and model predictions, though self-reported trust was unaffected.
Calibration has been proposed as a way to enhance the reliability and adoption of machine learning classifiers. We study a particular aspect of this proposal: how does calibrating a classification model affect the decisions made by non-expert humans consuming the model's predictions? We perform a Human-Computer-Interaction (HCI) experiment to ascertain the effect of calibration on (i) trust in the model, and (ii) the correlation between decisions and predictions. We also propose further corrections to the reported calibrated scores based on Kahneman and Tversky's prospect theory from behavioral economics, and study the effect of these corrections on trust and decision-making. We find that calibration is not sufficient on its own; the prospect theory correction is crucial for increasing the correlation between human decisions and the model's predictions. While this increased correlation suggests higher trust in the model, responses to ``Do you trust the model more?" are unaffected by the method used.