Spontaneous High-Order Generalization in Neural Theory-of-Mind Networks
This addresses the problem of developing more human-like cognitive systems in AI, though it appears incremental as it builds on existing ToM research with a specific network design.
The study tackled whether neural networks can spontaneously generalize from first- to higher-order Theory-of-Mind (ToM) without advanced skills, finding that a minimal neural ToM network achieved accuracies well above chance for second- and third-order ToM, with patterns aligning with human cognitive expectations.
Theory-of-Mind (ToM) is a core human cognitive capacity for attributing mental states to self and others. Wimmer and Perner demonstrated that humans progress from first- to higher-order ToM within a short span, completing this development before formal education or advanced skill acquisition. In contrast, neural networks represented by autoregressive language models progress from first- to higher-order ToM only alongside gains in advanced skills like reasoning, leaving open whether their trajectory can unfold independently, as in humans. In this research, we provided evidence that neural networks could spontaneously generalize from first- to higher-order ToM without relying on advanced skills. We introduced a neural Theory-of-Mind network (ToMNN) that simulated a minimal cognitive system, acquiring only first-order ToM competence. Evaluations of its second- and third-order ToM abilities showed accuracies well above chance. Also, ToMNN exhibited a sharper decline when generalizing from first- to second-order ToM than from second- to higher orders, and its accuracy decreased with greater task complexity. These perceived difficulty patterns were aligned with human cognitive expectations. Furthermore, the universality of results was confirmed across different parameter scales. Our findings illuminate machine ToM generalization patterns and offer a foundation for developing more human-like cognitive systems.