CLAIDec 11, 2023

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

arXiv:2312.06281v215.865 citationsh-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This provides a tool for researchers and developers to benchmark emotional intelligence in LLMs, though it is incremental as it builds on existing benchmarking approaches.

The authors tackled the problem of evaluating emotional intelligence in large language models by introducing EQ-Bench, a benchmark that assesses models' ability to predict emotional states in dialogues, and found it strongly correlates with broad intelligence benchmarks like MMLU (r=0.97).

We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes