CLJun 3

LifeSide: Benchmarking Agents as Lifelong Digital Companions

Yuqian Wu, Zhijie Deng, Wei Chen, Junwei Li, Yutian Jiang, Junle Chen, Zhengjun Huang, Qingxiang Liu, Jing Tang, Jiaheng Wei, Yuxuan Liang

arXiv:2606.0466050.2

AI Analysis

For researchers developing AI companions, this benchmark reveals a critical gap in current evaluations, showing that existing models cannot maintain long-term user understanding and emotional connection.

The paper introduces LifeSide, a benchmark for evaluating lifelong digital companions across multi-session memory, emotion, and environment loops. Experiments with 2,000 personas and 111K tasks show that even models saturating current memory benchmarks fail to sustain accurate user understanding and companionship over long horizons.

Lifelong digital companions must integrate cross-session cues, continually update their understanding of users, and adapt to shifting privacy boundaries. Existing evaluations fail to capture this, testing memory recall and short-term empathy in isolation. To bridge this gap, we introduce \benchmark, a benchmark centered on multi-session \textit{Memory-Emotion-Environment} loops. By modeling users as persistent worlds with layered profiles and event trajectories, \benchmark uses multi-agent simulation to project environmental dynamics into dialogue, preserving the critical gap between latent thoughts and observable expressions. Evaluating 2,000 personas and 111K tasks across memory tracking, user understanding, privacy control, and emotional companionship, our experiment results reveal a stark reality: even models that saturate current memory benchmarks fail to sustain accurate user understanding and true companionship over long horizons.

View on arXiv PDF

Similar