CLFeb 19

Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests

Anton Dzega, Aviad Elyashar, Ortal Slobodin, Odeya Cohen, Rami Puzis

arXiv:2602.17108v10.6h-index: 28

Originality Incremental advance

AI Analysis

This work provides a novel projective psychological framework for evaluating LMMs, addressing the need for non-language-based personality assessment in AI, though it is incremental in applying existing psychometric tools to new models.

This study assessed the personality traits of Large Multimodal Models (LMMs) using the Thematic Apperception Test (TAT) and found that while models understand interpersonal dynamics and self-concept well, they consistently fail to perceive and regulate aggression, with larger and more recent models outperforming smaller ones across assessment dimensions.

Thematic Apperception Test (TAT) is a psychometrically grounded, multidimensional assessment framework that systematically differentiates between cognitive-representational and affective-relational components of personality-like functioning. This test is a projective psychological framework designed to uncover unconscious aspects of personality. This study examines whether the personality traits of Large Multimodal Models (LMMs) can be assessed through non-language-based modalities, using the Social Cognition and Object Relations Scale - Global (SCORS-G). LMMs are employed in two distinct roles: as subject models (SMs), which generate stories in response to TAT images, and as evaluator models (EMs), who assess these narratives using the SCORS-G framework. Evaluators demonstrated an excellent ability to understand and analyze TAT responses. Their interpretations are highly consistent with those of human experts. Assessment results highlight that all models understand interpersonal dynamics very well and have a good grasp of the concept of self. However, they consistently fail to perceive and regulate aggression. Performance varied systematically across model families, with larger and more recent models consistently outperforming smaller and earlier ones across SCORS-G dimensions.

View on arXiv PDF

Similar