IR CL LGFeb 2, 2021

Deep Autoencoder-based Fuzzy C-Means for Topic Detection

Hendri Murfi, Natasha Rosaline, Nora Hariadi

arXiv:2102.02636v116 citations

Originality Incremental advance

AI Analysis

This work provides an incremental improvement in topic detection for researchers and practitioners working with textual data.

This paper tackles topic detection in textual data by proposing Deep Autoencoder-based Fuzzy C-Means (DFCM). DFCM improves the coherence score of eigenspace-based fuzzy c-means (EFCM) and performs comparably to leading standard methods like NMF and LDA.

Topic detection is a process for determining topics from a collection of textual data. One of the topic detection methods is a clustering-based method, which assumes that the centroids are topics. The clustering method has the advantage that it can process data with negative representations. Therefore, the clustering method allows a combination with a broader representation learning method. In this paper, we adopt deep learning for topic detection by using a deep autoencoder and fuzzy c-means called deep autoencoder-based fuzzy c-means (DFCM). The encoder of the autoencoder performs a lower-dimensional representation learning. Fuzzy c-means groups the lower-dimensional representation to identify the centroids. The autoencoder's decoder transforms back the centroids into the original representation to be interpreted as the topics. Our simulation shows that DFCM improves the coherence score of eigenspace-based fuzzy c-means (EFCM) and is comparable to the leading standard methods, i.e., nonnegative matrix factorization (NMF) or latent Dirichlet allocation (LDA).

View on arXiv PDF

Similar