CLAIIRMay 3, 2024

Attribution in Scientific Literature: New Benchmark and Methods

arXiv:2405.02228v312 citationsh-index: 30
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliable attribution for scientific communication using LLMs, offering a benchmark to improve trustworthiness, though it is incremental in enhancing existing methods.

The paper tackles the problem of automated source citation in scientific literature by introducing REASONS, a new dataset with sentence-level annotations across 12 domains, and finds that while top-tier LLMs achieve high performance, they struggle with high hallucination rates, with a metadata-augmented approach reducing these rates and RAG with Mistral cutting hallucination by 42% in indirect queries.

Large language models (LLMs) present a promising yet challenging frontier for automated source citation in scientific communication. Previous approaches to citation generation have been limited by citation ambiguity and LLM overgeneralization. We introduce REASONS, a novel dataset with sentence-level annotations across 12 scientific domains from arXiv. Our evaluation framework covers two key citation scenarios: indirect queries (matching sentences to paper titles) and direct queries (author attribution), both enhanced with contextual metadata. We conduct extensive experiments with models such as GPT-O1, GPT-4O, GPT-3.5, DeepSeek, and other smaller models like Perplexity AI (7B). While top-tier LLMs achieve high performance in sentence attribution, they struggle with high hallucination rates, a key metric for scientific reliability. Our metadata-augmented approach reduces hallucination rates across all tasks, offering a promising direction for improvement. Retrieval-augmented generation (RAG) with Mistral improves performance in indirect queries, reducing hallucination rates by 42% and maintaining competitive precision with larger models. However, adversarial testing highlights challenges in linking paper titles to abstracts, revealing fundamental limitations in current LLMs. REASONS provides a challenging benchmark for developing reliable and trustworthy LLMs in scientific applications

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes