AIMay 19

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Sharmin Sultana Srishty, Kazi Mahathir Rahman, Malaika Parizat Sakkhi, Samia Shahid Prianna, Shaikhul Islam Sinat

arXiv:2605.2042343.5Has Code

Predicted impact top 79% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers working on social reasoning in LLMs, OSCToM provides a way to generate challenging ToM scenarios that improve performance on recursive belief reasoning benchmarks.

OSCToM introduces a method to generate adversarial training data for high-order Theory of Mind reasoning in LLMs, achieving 76% accuracy on FANToM (vs. 0.2% for ExploreToM) with 6x more efficient data synthesis.

Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult. This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer's view of another agent conflicts with the observer's own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning. OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional surrogate models to generate observer-self conflicts. In our experiments, OSCToM-8B gives the best overall result among the systems tested. It improves on the reported ExploreToM results on FANToM and remains competitive on Hi-ToM and BigToM. On the information-asymmetric FANToM benchmark, OSCToM reaches 76% accuracy, compared with the 0.2% reported by ExploreToM. The data-synthesis procedure is also 6x more efficient, indicating that targeted training data can help smaller models handle advanced cognitive reasoning. The project code is available at https://github.com/sharminsrishty/osct.

View on arXiv PDF Code

Similar