IRMar 30, 2019

On the Estimation and Use of Statistical Modelling in Information Retrieval

arXiv:1904.00289v11.7h-index: 5

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable retrieval performance for users in IR systems by replacing assumptions with a principled method, though it is incremental as it builds on existing statistical modeling.

The paper tackles the problem of incorrect conclusions from distributional assumptions in information retrieval by proposing a statistically principled method to determine the true distribution, which leads to new ranking models that achieve results on par or better than strong baselines on TREC collections.

Several tasks in information retrieval (IR) rely on assumptions regarding the distribution of some property (such as term frequency) in the data being processed. This thesis argues that such distributional assumptions can lead to incorrect conclusions and proposes a statistically principled method for determining the "true" distribution. This thesis further applies this method to derive a new family of ranking models that adapt their computations to the statistics of the data being processed. Experimental evaluation shows results on par or better than multiple strong baselines on several TREC collections. Overall, this thesis concludes that distributional assumptions can be replaced with an effective, efficient and principled method for determining the "true" distribution and that using the "true" distribution can lead to improved retrieval performance.

View on arXiv PDF

Similar