Why Trust in AI May Be Inevitable
This addresses the problem of trust and explanation in human-AI interaction, particularly for users of sophisticated AI systems, but it is incremental as it builds on existing formal models to derive a theoretical result.
The paper tackles the problem of whether explanation is necessary for trust in AI by formalizing explanation as a search process, showing that explanation can fail even under ideal conditions due to time constraints, leading to a default to trust. This result implies that as AI systems like Large Language Models generate spurious explanations, humans may trust without genuine understanding, creating risks of misplaced trust and imperfect knowledge integration.
In human-AI interactions, explanation is widely seen as necessary for enabling trust in AI systems. We argue that trust, however, may be a pre-requisite because explanation is sometimes impossible. We derive this result from a formalization of explanation as a search process through knowledge networks, where explainers must find paths between shared concepts and the concept to be explained, within finite time. Our model reveals that explanation can fail even under theoretically ideal conditions - when actors are rational, honest, motivated, can communicate perfectly, and possess overlapping knowledge. This is because successful explanation requires not just the existence of shared knowledge but also finding the connection path within time constraints, and it can therefore be rational to cease attempts at explanation before the shared knowledge is discovered. This result has important implications for human-AI interaction: as AI systems, particularly Large Language Models, become more sophisticated and able to generate superficially compelling but spurious explanations, humans may default to trust rather than demand genuine explanations. This creates risks of both misplaced trust and imperfect knowledge integration.