CLJun 3

LifeSide: Benchmarking Agents as Lifelong Digital Companions

arXiv:2606.0466050.2
AI Analysis

For researchers developing AI companions, this benchmark reveals a critical gap in current evaluations, showing that existing models cannot maintain long-term user understanding and emotional connection.

The paper introduces LifeSide, a benchmark for evaluating lifelong digital companions across multi-session memory, emotion, and environment loops. Experiments with 2,000 personas and 111K tasks show that even models saturating current memory benchmarks fail to sustain accurate user understanding and companionship over long horizons.

Lifelong digital companions must integrate cross-session cues, continually update their understanding of users, and adapt to shifting privacy boundaries. Existing evaluations fail to capture this, testing memory recall and short-term empathy in isolation. To bridge this gap, we introduce \benchmark, a benchmark centered on multi-session \textit{Memory-Emotion-Environment} loops. By modeling users as persistent worlds with layered profiles and event trajectories, \benchmark uses multi-agent simulation to project environmental dynamics into dialogue, preserving the critical gap between latent thoughts and observable expressions. Evaluating 2,000 personas and 111K tasks across memory tracking, user understanding, privacy control, and emotional companionship, our experiment results reveal a stark reality: even models that saturate current memory benchmarks fail to sustain accurate user understanding and true companionship over long horizons.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes