IRJun 17, 2013

The Number of Terms and Documents for Pseudo-Relevant Feedback for Ad-hoc Information Retrieval

arXiv:1306.3955v13 citations
AI Analysis

This work addresses parameter tuning for query reformulation in Arabic information retrieval, but it is incremental as it applies an existing method to a specific language domain.

The study investigated how varying the number of documents (D) and terms (T) in pseudo-relevant feedback affects performance in an Arabic ad-hoc information retrieval system, finding that success depends on selecting enough documents and a small set of relevant terms, with some queries not improving through reformulation.

In Information Retrieval System (IRS), the Automatic Relevance Feedback (ARF) is a query reformulation technique that modifies the initial one without the user intervention. It is applied mainly through the addition of terms coming from the external resources such as the ontologies and or the results of the current research. In this context we are mainly interested in the local analysis technique for the ARF in ad-hoc IRS on Arabic documents. In this article, we have examined the impact of the variation of the two parameters implied in this technique, that is to say, the number of the documents «D» and the number of terms «T», on an Arabic IRS performance. The experimentation, carried out on an Arabic corpus text, enables us to deduce that there are queries which are not easily improvable with the query reformulation. In addition, the success of the ARF is due mainly to the selection of a sufficient number of documents D and to the extraction of a very reduced set of relevant terms T for retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes