Mining Scientific Papers for Bibliometrics: a (very) Brief Survey of Methods and Tools
This is an incremental survey that addresses the problem of integrating bibliometrics with computational methods for researchers in bibliometrics and computational linguistics.
The paper surveys methods and tools for mining scientific papers to enhance bibliometrics through large-scale text analytics and sense mining, aiming to bridge bibliometrics with computational linguistics and natural language processing.
The Open Access movement in scientific publishing and search engines like Google Scholar have made scientific articles more broadly accessible. During the last decade, the availability of scientific papers in full text has become more and more widespread thanks to the growing number of publications on online platforms such as ArXiv and CiteSeer. The efforts to provide articles in machine-readable formats and the rise of Open Access publishing have resulted in a number of standardized formats for scientific papers (such as NLM-JATS, TEI, DocBook). Our aim is to stimulate research at the intersection of Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing.