CLMar 23, 2025

Exploring Topic Trends in COVID-19 Research Literature using Non-Negative Matrix Factorization

arXiv:2503.18182v11 citationsh-index: 2IEEE Trans Artif Intell
Originality Synthesis-oriented
AI Analysis

This work provides a structured overview of COVID-19 research topics, aiding researchers in navigating the literature, but it is incremental as it applies an existing method to a new dataset.

The researchers applied Non-Negative Matrix Factorization (NMF) to the COVID-19 Open Research Dataset (CORD-19) to identify and track thematic trends in COVID-19 literature over time, using stability analysis to optimize the topic model.

In this work, we apply topic modeling using Non-Negative Matrix Factorization (NMF) on the COVID-19 Open Research Dataset (CORD-19) to uncover the underlying thematic structure and its evolution within the extensive body of COVID-19 research literature. NMF factorizes the document-term matrix into two non-negative matrices, effectively representing the topics and their distribution across the documents. This helps us see how strongly documents relate to topics and how topics relate to words. We describe the complete methodology which involves a series of rigorous pre-processing steps to standardize the available text data while preserving the context of phrases, and subsequently feature extraction using the term frequency-inverse document frequency (tf-idf), which assigns weights to words based on their frequency and rarity in the dataset. To ensure the robustness of our topic model, we conduct a stability analysis. This process assesses the stability scores of the NMF topic model for different numbers of topics, enabling us to select the optimal number of topics for our analysis. Through our analysis, we track the evolution of topics over time within the CORD-19 dataset. Our findings contribute to the understanding of the knowledge structure of the COVID-19 research landscape, providing a valuable resource for future research in this field.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes