Generating Topic Pages for Scientific Concepts Using Scientific Publications
This provides a scalable resource for learners and researchers to access scientific information, though it is incremental in applying existing NLP/ML methods to a new dataset.
The paper tackles the problem of helping readers understand scientific concepts by automatically generating Topic Pages from publications, resulting in over 360,000 pages across 20 domains with 23 million monthly visits.
In this paper, we describe Topic Pages, an inventory of scientific concepts and information around them extracted from a large collection of scientific books and journals. The main aim of Topic Pages is to provide all the necessary information to the readers to understand scientific concepts they come across while reading scholarly content in any scientific domain. Topic Pages are a collection of automatically generated information pages using NLP and ML, each corresponding to a scientific concept. Each page contains three pieces of information: a definition, related concepts, and the most relevant snippets, all extracted from scientific peer-reviewed publications. In this paper, we discuss the details of different components to extract each of these elements. The collection of pages in production contains over 360,000 Topic Pages across 20 different scientific domains with an average of 23 million unique visits per month, constituting it a popular source for scientific information.