IRCLJul 11, 2022

Topic-Grained Text Representation-based Model for Document Retrieval

arXiv:2207.04656v13 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses storage efficiency for document retrieval systems, though it is incremental as it builds on existing representation-based paradigms.

The paper tackles the high storage cost of word-grained document representations in retrieval systems by introducing TGTR, a model that uses topic-grained representations, achieving competitive retrieval accuracy on TREC CAR and MS MARCO while reducing storage space to less than 1/10 of baseline methods.

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes