HCLGJan 17, 2025

Visual Exploration of Stopword Probabilities in Topic Models

arXiv:2501.10137v14 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the issue of suboptimal performance and reduced user confidence in topic model visualizations for practitioners and stakeholders, though it is incremental in improving existing stopword analysis methods.

The paper tackled the problem of stopword removal in topic models by proposing a corpus-specific probabilistic estimation method and an interactive visualization system, which increased user confidence in model credibility through reasonable probabilities, extended stopword lists, and adjustable thresholds.

Stopword removal is a critical stage in many Machine Learning methods but often receives little consideration, it interferes with the model visualizations and disrupts user confidence. Inappropriately chosen or hastily omitted stopwords not only lead to suboptimal performance but also significantly affect the quality of models, thus reducing the willingness of practitioners and stakeholders to rely on the output visualizations. This paper proposes a novel extraction method that provides a corpus-specific probabilistic estimation of stopword likelihood and an interactive visualization system to support their analysis. We evaluated our approach and interface using real-world data, a commonly used Machine Learning method (Topic Modelling), and a comprehensive qualitative experiment probing user confidence. The results of our work show that our system increases user confidence in the credibility of topic models by (1) returning reasonable probabilities, (2) generating an appropriate and representative extension of common stopword lists, and (3) providing an adjustable threshold for estimating and analyzing stopwords visually. Finally, we discuss insights, recommendations, and best practices to support practitioners while improving the output of Machine Learning methods and topic model visualizations with robust stopword analysis and removal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes