CLDLIRApr 24, 2023

Generating Topic Pages for Scientific Concepts Using Scientific Publications

arXiv:2304.11922v14 citationsh-index: 62
Originality Synthesis-oriented
AI Analysis

This provides a scalable resource for learners and researchers to access scientific information, though it is incremental in applying existing NLP/ML methods to a new dataset.

The paper tackles the problem of helping readers understand scientific concepts by automatically generating Topic Pages from publications, resulting in over 360,000 pages across 20 domains with 23 million monthly visits.

In this paper, we describe Topic Pages, an inventory of scientific concepts and information around them extracted from a large collection of scientific books and journals. The main aim of Topic Pages is to provide all the necessary information to the readers to understand scientific concepts they come across while reading scholarly content in any scientific domain. Topic Pages are a collection of automatically generated information pages using NLP and ML, each corresponding to a scientific concept. Each page contains three pieces of information: a definition, related concepts, and the most relevant snippets, all extracted from scientific peer-reviewed publications. In this paper, we discuss the details of different components to extract each of these elements. The collection of pages in production contains over 360,000 Topic Pages across 20 different scientific domains with an average of 23 million unique visits per month, constituting it a popular source for scientific information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes