IRMay 5, 2021

WTR: A Test Collection for Web Table Retrieval

arXiv:2105.02354v13 citations
Originality Synthesis-oriented
AI Analysis

This provides a benchmark for researchers in information retrieval, though it is incremental as it builds on prior work by adding context.

The authors tackled the lack of comprehensive test collections for web table retrieval by creating WTR, a dataset from Common Crawl with relevance judgments for both tables and their context, and showed that using context labels improves existing methods.

We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes