Inline Citation Classification using Peripheral Context and Time-evolving Augmentation
This work addresses the immature state of inline citation classification, which is important for researchers analyzing scientific literature, by providing a more comprehensive dataset and model, though it is incremental in nature.
The paper tackles the problem of inline citation classification by introducing a new dataset (3Cext) that includes peripheral sentences and domain knowledge, and proposes a Transformer-based model (PeriCite) that achieves state-of-the-art performance with a +0.09 F1 improvement over the best baseline.
Citation plays a pivotal role in determining the associations among research articles. It portrays essential information in indicative, supportive, or contrastive studies. The task of inline citation classification aids in extrapolating these relationships; However, existing studies are still immature and demand further scrutiny. Current datasets and methods used for inline citation classification only use citation-marked sentences constraining the model to turn a blind eye to domain knowledge and neighboring contextual sentences. In this paper, we propose a new dataset, named 3Cext, which along with the cited sentences, provides discourse information using the vicinal sentences to analyze the contrasting and entailing relationships as well as domain information. We propose PeriCite, a Transformer-based deep neural network that fuses peripheral sentences and domain knowledge. Our model achieves the state-of-the-art on the 3Cext dataset by +0.09 F1 against the best baseline. We conduct extensive ablations to analyze the efficacy of the proposed dataset and model fusion methods.