IRSINAOct 4, 2020

Sparseness-constrained Nonnegative Tensor Factorization for Detecting Topics at Different Time Scales

arXiv:2010.01600v39 citations
Originality Incremental advance
AI Analysis

This work addresses topic modeling for temporal data like news or social media, offering incremental improvements in controlling topic persistence.

The paper tackles the problem of detecting both long-lasting trends and short-lived topics in temporal data by proposing sparseness-constrained nonnegative tensor factorization (S-NCPD) and its online variant, which effectively control topic length and reduce reconstruction error more rapidly, as demonstrated on semi-synthetic and real-world news data.

Temporal data (such as news articles or Twitter feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this paper, we first show that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) is able to discover topics of variable persistence automatically. Then, we propose sparseness-constrained NCPD (S-NCPD) and its online variant in order to actively control the length of the learned topics effectively and efficiently. Further, we propose quantitative ways to measure the topic length and demonstrate the ability of S-NCPD (as well as its online variant) to discover short and long-lasting temporal topics in a controlled manner in semi-synthetic and real-world data including news headlines. We also demonstrate that the online variant of S-NCPD reduces the reconstruction error more rapidly than S-NCPD.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes