CLAIOct 24, 2023

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

arXiv:2310.18359v1134 citationsh-index: 44
Originality Synthesis-oriented
AI Analysis

This addresses the problem of unreliable benchmarks for researchers in social intelligence, though it is incremental as it builds on an existing dataset.

The authors tackled biases in the Social-IQ benchmark for social intelligence, revealing that a language model could exploit these to achieve perfect performance without context, and introduced DeSIQ, a new dataset that significantly reduces these biases.

Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes