CLIRJul 31, 2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

arXiv:2307.16883v170 citationsh-index: 87
Originality Synthesis-oriented
AI Analysis

This addresses the problem of developing better-attributed search models for AI researchers, though it is incremental as it builds on existing datasets and methods.

The paper tackles the lack of openly accessible datasets for building generative information-seeking models with attribution by introducing HAGRID, a dataset constructed through human-LLM collaboration based on MIRACL, which includes attributed explanations evaluated by humans for informativeness and attributability.

The rise of large language models (LLMs) had a transformative impact on search, ushering in a new era of search engines that are capable of generating search results in natural language text, imbued with citations for supporting sources. Building generative information-seeking models demands openly accessible datasets, which currently remain lacking. In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations. Unlike recent efforts that focus on human evaluation of black-box proprietary search engines, we built our dataset atop the English subset of MIRACL, a publicly available information retrieval dataset. HAGRID is constructed based on human and LLM collaboration. We first automatically collect attributed explanations that follow an in-context citation style using an LLM, i.e. GPT-3.5. Next, we ask human annotators to evaluate the LLM explanations based on two criteria: informativeness and attributability. HAGRID serves as a catalyst for the development of information-seeking models with better attribution capabilities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes