CVMar 28, 2025

SocialGen: Modeling Multi-Human Social Interaction with Language Models

SalesforceStanford
arXiv:2503.22906v19 citationsh-index: 64
Originality Highly original
AI Analysis

This addresses the challenge of social interaction modeling for applications like robotics and virtual agents, representing a novel advancement beyond prior two-person methods.

The paper tackles the problem of modeling multi-human social interactions by introducing SocialGen, a unified motion-language model that supports varying numbers of individuals, achieving state-of-the-art performance across motion-language tasks.

Human interactions in everyday life are inherently social, involving engagements with diverse individuals across various contexts. Modeling these social interactions is fundamental to a wide range of real-world applications. In this paper, we introduce SocialGen, the first unified motion-language model capable of modeling interaction behaviors among varying numbers of individuals, to address this crucial yet challenging problem. Unlike prior methods that are limited to two-person interactions, we propose a novel social motion representation that supports tokenizing the motions of an arbitrary number of individuals and aligning them with the language space. This alignment enables the model to leverage rich, pretrained linguistic knowledge to better understand and reason about human social behaviors. To tackle the challenges of data scarcity, we curate a comprehensive multi-human interaction dataset, SocialX, enriched with textual annotations. Leveraging this dataset, we establish the first comprehensive benchmark for multi-human interaction tasks. Our method achieves state-of-the-art performance across motion-language tasks, setting a new standard for multi-human interaction modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes