AIIRJan 8

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

arXiv:2601.04745v16 citationsh-index: 3Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better benchmarks in person understanding for AI companions, though it is incremental as it builds on existing memory benchmarks.

The authors tackled the problem of evaluating person understanding for lifelong digital companions by introducing KnowMe-Bench, a benchmark built from long-form autobiographical narratives, which revealed that retrieval-augmented systems improve factual accuracy but still struggle with temporally grounded explanations and higher-level inferences.

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in \href{KnowMeBench}{https://github.com/QuantaAlpha/KnowMeBench}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes