From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines
For web search engines in high-stakes domains like healthcare and finance, AuthGR addresses the critical problem of retrieving trustworthy documents by integrating authority into generative retrieval.
AuthGR is the first generative retrieval framework that incorporates document authority alongside relevance, using multimodal scoring and a three-stage training pipeline. In offline tests, a 3B model matched a 14B baseline, and online A/B tests on a commercial search platform showed significant improvements in user engagement and reliability.
Generative information retrieval (GenIR) formulates the retrieval process as a text-to-text generation task, leveraging the vast knowledge of large language models. However, existing works primarily optimize for relevance while often overlooking document trustworthiness. This is critical in high-stakes domains like healthcare and finance, where relying solely on semantic relevance risks retrieving unreliable information. To address this, we propose an Authority-aware Generative Retriever (AuthGR), the first framework that incorporates authority into GenIR. AuthGR consists of three key components: (i) Multimodal Authority Scoring, which employs a vision-language model to quantify authority from textual and visual cues; (ii) a Three-stage Training Pipeline to progressively instill authority awareness into the retriever; and (iii) a Hybrid Ensemble Pipeline for robust deployment. Offline evaluations demonstrate that AuthGR successfully enhances both authority and accuracy, with our 3B model matching a 14B baseline. Crucially, large-scale online A/B tests and human evaluations conducted on the commercial web search platform confirm significant improvements in real-world user engagement and reliability.