CLLGDec 14, 2020

Primer AI's Systems for Acronym Identification and Disambiguation

arXiv:2012.08013v20.001 citations
AI Analysis50

This work addresses the problem of ambiguous acronyms in scientific documents, which hinders understanding for both humans and machines.

The paper introduces new methods for acronym identification and disambiguation in scientific documents. Their systems achieve significant performance gains over previous methods and perform competitively on the SDU@AAAI-21 shared task leaderboard.

The prevalence of ambiguous acronyms make scientific documents harder to understand for humans and machines alike, presenting a need for models that can automatically identify acronyms in text and disambiguate their meaning. We introduce new methods for acronym identification and disambiguation: our acronym identification model projects learned token embeddings onto tag predictions, and our acronym disambiguation model finds training examples with similar sentence embeddings as test examples. Both of our systems achieve significant performance gains over previously suggested methods, and perform competitively on the SDU@AAAI-21 shared task leaderboard. Our models were trained in part on new distantly-supervised datasets for these tasks which we call AuxAI and AuxAD. We also identified a duplication conflict issue in the SciAD dataset, and formed a deduplicated version of SciAD that we call SciAD-dedupe. We publicly released all three of these datasets, and hope that they help the community make further strides in scientific document understanding.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes