CLApr 26, 2022

Symlink: A New Dataset for Scientific Symbol-Description Linking

arXiv:2204.12070v14 citationsh-index: 41
Originality Synthesis-oriented
AI Analysis

This addresses the problem of linking mathematical symbols and descriptions in scientific documents for researchers in fields like computer science and physics, but it is incremental as it focuses on dataset creation.

The authors introduced Symlink, a new large-scale dataset for extracting symbols and descriptions in scientific documents across five domains, and experiments showed it poses challenges for existing models, calling for further research.

Mathematical symbols and descriptions appear in various forms across document section boundaries without explicit markup. In this paper, we present a new large-scale dataset that emphasizes extracting symbols and descriptions in scientific documents. Symlink annotates scientific papers of 5 different domains (i.e., computer science, biology, physics, mathematics, and economics). Our experiments on Symlink demonstrate the challenges of the symbol-description linking task for existing models and call for further research effort in this area. We will publicly release Symlink to facilitate future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes