58.5CYApr 9
Communicate-Predict-Act: Evaluating Social Intelligence of AgentsDavid Shoresh, Sarit Kraus, Yonatan Loewenstein
As large language model (LLM) agents become more prevalent in real world social settings, social intelligence will play an increasingly critical role. But social intelligence is still a poorly defined construct, for humans and artificial agents. We introduce a multiplayer arena of mixed cooperative and competitive social games to study LLM social intelligence. The controllability of LLM based agents enables systematic evaluation, which also supports broader inferences about social intelligence per se. We evaluated eight diverse LLMs (24B to 1T parameters) using a Communicate Predict Act (COMPACT) interaction protocol and fine grained probing of social dynamics. Elo style ratings reveal consistent performance differences across models, but this scalar measure provides only a partial characterization of social intelligence. To address this limitation, we analyze gameplay traces to extract sociocognitive metrics capturing action prediction, communicative influence, strategic reasoning, and tradeoffs under conflicting interests. These sociocognitive metrics exhibit strong intramodel consistency and they reliably predict pairwise agent advantage in game outcomes (AUC ROC = 0.82). Feature importance analysis indicates that surprisingly, influence, transparency, and adaptability are more predictive of success than Theory of Mind inference or deep planning. Together, our results advance a testable, multidimensional conception of social intelligence and provide empirical insights into the capacities that underpin it.
HCSep 18, 2025
Subject Matter Expertise vs Professional Management in Collective Sequential Decision MakingDavid Shoresh, Yonatan Loewenstein
Your company's CEO is retiring. You search for a successor. You can promote an employee from the company familiar with the company's operations, or recruit an external professional manager. Who should you prefer? It has not been clear how to address this question, the "subject matter expertise vs. professional manager debate", quantitatively and objectively. We note that a company's success depends on long sequences of interdependent decisions, with often-opposing recommendations of diverse board members. To model this task in a controlled environment, we utilize chess - a complex, sequential game with interdependent decisions which allows for quantitative analysis of performance and expertise (since the states, actions and game outcomes are well-defined). The availability of chess engines differing in style and expertise, allows scalable experimentation. We considered a team of (computer) chess players. At each turn, team members recommend a move and a manager chooses a recommendation. We compared the performance of two manager types. For manager as "subject matter expert", we used another (computer) chess player that assesses the recommendations of the team members based on its own chess expertise. We examined the performance of such managers at different strength levels. To model a "professional manager", we used Reinforcement Learning (RL) to train a network that identifies the board positions in which different team members have relative advantage, without any pretraining in chess. We further examined this network to see if any chess knowledge is acquired implicitly. We found that subject matter expertise beyond a minimal threshold does not significantly contribute to team synergy. Moreover, performance of a RL-trained "professional" manager significantly exceeds that of even the best "expert" managers, while acquiring only limited understanding of chess.