AI IRJan 8

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Tingyu Wu, Zhisheng Chen, Ziyan Weng, Shuhe Wang, Chenglong Li, Shuo Zhang, Sen Hu, Silin Wu, Qizhen Lan, Huacan Wang, Ronghao Chen

arXiv:2601.04745v112.86 citationsh-index: 3Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better benchmarks in person understanding for AI companions, though it is incremental as it builds on existing memory benchmarks.

The authors tackled the problem of evaluating person understanding for lifelong digital companions by introducing KnowMe-Bench, a benchmark built from long-form autobiographical narratives, which revealed that retrieval-augmented systems improve factual accuracy but still struggle with temporally grounded explanations and higher-level inferences.

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.

View on arXiv PDF Code

Similar