CLAIMay 20, 2025

A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

arXiv:2505.14106v217 citationsh-index: 37Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for better evaluation of LLMs in personalized, multi-turn conversations for researchers and developers, though it is incremental as it builds on existing benchmarks by combining personalization and conversational elements.

The authors tackled the problem of evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs) by creating PersonaConvBench, a benchmark that integrates personalization and conversational structure, resulting in substantial performance improvements such as a 198% relative gain in sentiment classification over non-conversational baselines.

We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on either personalization or conversational structure in isolation, PersonaConvBench integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text generation across ten diverse Reddit-based domains. This design enables systematic analysis of how personalized conversational context shapes LLM outputs in realistic multi-user scenarios. We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements, including a 198 percent relative gain over the best non-conversational baseline in sentiment classification. By releasing PersonaConvBench with evaluations and code, we aim to support research on LLMs that adapt to individual styles, track long-term context, and produce contextually rich, engaging responses.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes