Predicting Long-Term Citations from Short-Term Linguistic Influence
This provides a more content-based measure of influence for researchers and bibliometricians, though it is incremental as it builds on existing citation prediction methods.
The paper tackled the problem of predicting long-term citations by quantifying linguistic influence in research papers, showing that linguistic influence estimated from the first two years after publication is correlated with and predictive of citation counts in the next three years.
A standard measure of the influence of a research paper is the number of times it is cited. However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications. We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps: first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these changes into per-document influence scores by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We show that this measure of linguistic influence is predictive of $\textit{future}$ citations: the estimate of linguistic influence from the two years after a paper's publication is correlated with and predictive of its citation count in the following three years. This is demonstrated using an online evaluation with incremental temporal training/test splits, in comparison with a strong baseline that includes predictors for initial citation counts, topics, and lexical features.