CLJan 29, 2022

Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes

arXiv:2201.12568v1
Originality Incremental advance
AI Analysis

This addresses the challenge of clustering documents with limited or uncorrelated temporal and textual data, though it appears incremental as it generalizes previous work like DHP and UP.

The authors tackled the problem of clustering textual documents when either temporal information or textual content is weakly informative, by developing the Powered Dirichlet-Hawkes process (PDHP), which yields significantly better results than state-of-the-art models.

The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little. Furthermore, the textual content of a document is not always correlated to its temporal dynamics. We develop a method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. PDHP also alleviates the hypothesis that textual content and temporal dynamics are perfectly correlated. We demonstrate that PDHP generalizes previous work --such as DHP and UP. Finally, we illustrate a possible application using a real-world dataset from Reddit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes