CLJul 23, 2017

Fine Grained Citation Span for References in Wikipedia

arXiv:1707.07278v11088 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for verifiability in Wikipedia by helping editors identify missing citations, though it is an incremental application of existing methods to a new domain.

The paper tackles the problem of determining the citation span in Wikipedia articles to identify what content is covered by citations, and it shows improvement in all evaluation metrics compared to baselines from the scientific domain.

\emph{Verifiability} is one of the core editing principles in Wikipedia, editors being encouraged to provide citations for the added content. For a Wikipedia article, determining the \emph{citation span} of a citation, i.e. what content is covered by a citation, is important as it helps decide for which content citations are still missing. We are the first to address the problem of determining the \emph{citation span} in Wikipedia articles. We approach this problem by classifying which textual fragments in an article are covered by a citation. We propose a sequence classification approach where for a paragraph and a citation, we determine the citation span at a fine-grained level. We provide a thorough experimental evaluation and compare our approach against baselines adopted from the scientific domain, where we show improvement for all evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes