LUCid: Redefining Relevance For Lifelong Personalization
For researchers and developers of lifelong personalization systems, this work identifies and measures a critical gap in how relevance is operationalized, with implications for robustness and safety.
Current personalization systems rely on semantic proximity, missing relevant user information from topically unrelated interactions. LUCid, a benchmark of 1,936 queries with up to 500 sessions, reveals that retrieval recall drops to near zero and response alignment remains near 50% even for state-of-the-art models like Gemini-3-Flash and GPT-5.4, exposing a fundamental mismatch in relevance definition.
Current approaches to lifelong personalization operationalize relevance through semantic proximity, causing them to miss essential user information from topically unrelated interactions. To address this gap, we introduce LUCid, a benchmark designed to measure situational user-centric relevance in personalization. The benchmark consists of 1,936 realistic queries paired with interaction histories from up to 500 sessions. Across multiple architectures, our experiments show significant performance collapse when relevant context must be surfaced from semantically distant history: retrieval recall drops to near zero on the hardest instances, and response alignment remains near 50% even for state-of-the-art models such as Gemini-3-Flash, GPT-5.4, and Claude Haiku. These results expose a fundamental mismatch between the notion of relevance encoded by current systems and the situational relevance required for personalization, with direct implications for robustness and safety when critical user attributes remain undetected. LUCid enables the systematic evaluation of whether current models can surface situationally-relevant user information from previous interactions, and serves as a step toward realigning personalization with user-centered relevance.