IR LG MLSep 13, 2020

AOBTM: Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis

arXiv:2009.09930v13.01 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of dynamic topic modeling in short texts for mobile app developers, but it is incremental as it builds on existing online topic modeling methods.

The paper tackles the problem of analyzing short texts like mobile app reviews over time, proposing AOBTM to model topics adaptively, and it outperforms state-of-the-art baselines by finding more coherent topics in evaluations.

Analysis of mobile app reviews has shown its important role in requirement engineering, software maintenance and evolution of mobile apps. Mobile app developers check their users' reviews frequently to clarify the issues experienced by users or capture the new issues that are introduced due to a recent app update. App reviews have a dynamic nature and their discussed topics change over time. The changes in the topics among collected reviews for different versions of an app can reveal important issues about the app update. A main technique in this analysis is using topic modeling algorithms. However, app reviews are short texts and it is challenging to unveil their latent topics over time. Conventional topic models suffer from the sparsity of word co-occurrence patterns while inferring topics for short texts. Furthermore, these algorithms cannot capture topics over numerous consecutive time-slices. Online topic modeling algorithms speed up the inference of topic models for the texts collected in the latest time-slice by saving a fraction of data from the previous time-slice. But these algorithms do not analyze the statistical-data of all the previous time-slices, which can confer contributions to the topic distribution of the current time-slice. We propose Adaptive Online Biterm Topic Model (AOBTM) to model topics in short texts adaptively. AOBTM alleviates the sparsity problem in short-texts and considers the statistical-data for an optimal number of previous time-slices. We also propose parallel algorithms to automatically determine the optimal number of topics and the best number of previous versions that should be considered in topic inference phase. Automatic evaluation on collections of app reviews and real-world short text datasets confirm that AOBTM can find more coherent topics and outperforms the state-of-the-art baselines.

View on arXiv PDF Code

Similar