A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks
It addresses safety concerns for AI systems with social intelligence, but is incremental as a survey.
The paper surveys evaluations of Theory of Mind capabilities in Large Language Models, identifies associated safety risks, and proposes research directions for mitigation.
Theory of Mind (ToM), the ability to attribute mental states to others and predict their behaviour, is fundamental to social intelligence. In this paper, we survey studies evaluating behavioural and representational ToM in Large Language Models (LLMs), identify important safety risks from advanced LLM ToM capabilities, and suggest several research directions for effective evaluation and mitigation of these risks.