IRJul 28, 2020

Declarative Experimentation in Information Retrieval using PyTerrier

arXiv:2007.14271v1170 citations
AI Analysis

This addresses the problem of cumbersome and non-expressive IR experimentation for researchers, though it is incremental as it builds on existing IR platforms like Anserini and Terrier.

The authors tackled the lack of a high-level formalism for expressing information retrieval (IR) pipelines by proposing PyTerrier, a declarative framework that compiles and optimizes retrieval experiments, demonstrating efficiency benefits on TREC Robust and ClueWeb09 test collections.

The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections, demonstrate the efficiency benefits of these optimisations for retrieval pipelines involving both the Anserini and Terrier IR platforms.

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes