CLIROct 13, 2020

Aspect-based Document Similarity for Research Papers

arXiv:2010.06395v11002 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for finer-grained similarity measures in applications like recommender systems for research papers, though it is incremental as it builds on existing Transformer models.

The paper tackled the problem of coarse-grained document similarity by introducing aspect-based similarity for research papers, using citation section titles as labels and evaluating Transformer models, with SciBERT achieving the best performance on datasets of 172,073 paper pairs.

Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes