Hidden Citations Obscure True Impact in Science
This addresses the limitation of bibliometric measures in quantifying true scientific impact, which is a problem for researchers and policymakers relying on citation-based metrics.
The paper tackled the problem of hidden citations, where influential discoveries are credited in text but not referenced, by using unsupervised interpretable machine learning on full texts to systematically identify them, finding that hidden citations often outnumber formal citations for such discoveries across disciplines.
References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.