IRCLMay 13

RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

arXiv:2605.1305273.41 citations
AI Analysis

For commercial web search engines, this provides a scalable method to dynamically determine content relevance based on semantic expiration rather than chronological age.

Baidu deployed an LLM-based framework that predicts query-specific content expiration, replacing static time-window filtering. Online A/B tests showed significant improvements in search freshness and user experience metrics.

In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent but semantically expired. To address the limitation, we present a novel Large Language Models (LLMs)-based Query-Aware Dynamic Content Expiration Prediction Framework deployed in Baidu search, reformulating timeliness as a dynamic validity inference task. Our framework extracts fine-grained temporal contexts from documents and leverages LLMs to deduce a query-specific "validity horizon"-a semantic boundary defining when information becomes obsolete based on user intent. Integrated with robust hallucination mitigation strategies to ensure reliability, our approach has been evaluated through offline and online A/B testing on live production traffic. Results demonstrate significant improvements in search freshness and user experience metrics, validating the effectiveness of LLM-driven reasoning for solving semantic expiration at an industrial scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes