IRCLJun 27, 2025

HyReC: Exploring Hybrid-based Retriever for Chinese

arXiv:2506.21913v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses a gap in hybrid retrieval for Chinese, offering a tailored solution that could benefit applications in Chinese information retrieval, though it appears incremental as it builds on existing hybrid paradigms.

The paper tackles the problem of applying hybrid-based retrieval methods to Chinese contexts, which had been underexplored, and introduces HyReC, an end-to-end optimization method that integrates semantic union of terms and uses a Global-Local-Aware Encoder and Normalization Module to enhance performance, achieving effectiveness as demonstrated on the C-MTEB retrieval benchmark.

Hybrid-based retrieval methods, which unify dense-vector and lexicon-based retrieval, have garnered considerable attention in the industry due to performance enhancement. However, despite their promising results, the application of these hybrid paradigms in Chinese retrieval contexts has remained largely underexplored. In this paper, we introduce HyReC, an innovative end-to-end optimization method tailored specifically for hybrid-based retrieval in Chinese. HyReC enhances performance by integrating the semantic union of terms into the representation model. Additionally, it features the Global-Local-Aware Encoder (GLAE) to promote consistent semantic sharing between lexicon-based and dense retrieval while minimizing the interference between them. To further refine alignment, we incorporate a Normalization Module (NM) that fosters mutual benefits between the retrieval approaches. Finally, we evaluate HyReC on the C-MTEB retrieval benchmark to demonstrate its effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes