CLMar 18, 2022

RELIC: Retrieving Evidence for Literary Claims

DeepMind
arXiv:2203.10053v1645 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses the need for automated evidence retrieval in humanities scholarship, though it is incremental as it builds on existing retrieval methods.

The authors tackled the problem of retrieving literary evidence by introducing a dataset of 78K quotations and a retrieval task, where a RoBERTa-based model outperformed existing baselines but still showed substantial room for improvement.

Humanities scholars commonly provide evidence for claims that they make about a work of literature (e.g., a novel) in the form of quotations from the work. We collect a large-scale dataset (RELiC) of 78K literary quotations and surrounding critical analysis and use it to formulate the novel task of literary evidence retrieval, in which models are given an excerpt of literary analysis surrounding a masked quotation and asked to retrieve the quoted passage from the set of all passages in the work. Solving this retrieval task requires a deep understanding of complex literary and linguistic phenomena, which proves challenging to methods that overwhelmingly rely on lexical and semantic similarity matching. We implement a RoBERTa-based dense passage retriever for this task that outperforms existing pretrained information retrieval baselines; however, experiments and analysis by human domain experts indicate that there is substantial room for improvement over our dense retriever.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes