Rebecca M. C. Taylor

LG
4papers
6citations
Novelty34%
AI Score18

4 Papers

LGAug 19, 2022
SimLDA: A tool for topic model evaluation

Rebecca M. C. Taylor, Johan A. du Preez

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.

LGAug 25, 2022
Rail break and derailment prediction using Probabilistic Graphical Modelling

Rebecca M. C. Taylor, Johan A. du Preez

Rail breaks are one of the most common causes of derailments internationally. This is no different for the South African Iron Ore line. Many rail breaks occur as a heavy-haul train passes over a crack, large defect or defective weld. In such cases, it is usually too late for the train to slow down in time to prevent a de-railment. Knowing the risk of a rail break occurring associated with a train passing over a section of rail allows for better implementation of maintenance initiatives and mitigating measures. In this paper the Ore Line's specific challenges are discussed and the currently available data that can be used to create a rail break risk prediction model is reviewed. The development of a basic rail break risk prediction model for the Ore Line is then presented. Finally the insight gained from the model is demonstrated by means of discussing various scenarios of various rail break risk. In future work, we are planning on extending this basic model to allow input from live monitoring systems such as the ultrasonic broken rail detection system.

LGNov 2, 2021
A derivation of variational message passing (VMP) for latent Dirichlet allocation (LDA)

Rebecca M. C. Taylor, Dirko Coetsee, Johan A. du Preez

Latent Dirichlet Allocation (LDA) is a probabilistic model used to uncover latent topics in a corpus of documents. Inference is often performed using variational Bayes (VB) algorithms, which calculate a lower bound to the posterior distribution over the parameters. Deriving the variational update equations for new models requires considerable manual effort; variational message passing (VMP) has emerged as a "black-box" tool to expedite the process of variational inference. But applying VMP in practice still presents subtle challenges, and the existing literature does not contain the steps that are necessary to implement VMP for the standard smoothed LDA model, nor are available black-box probabilistic graphical modelling software able to do the word-topic updates necessary to implement LDA. In this paper, we therefore present a detailed derivation of the VMP update equations for LDA. We see this as a first step to enabling other researchers to calculate the VMP updates for similar graphical models.

LGOct 1, 2021
ALBU: An approximate Loopy Belief message passing algorithm for LDA to improve performance on small data sets

Rebecca M. C. Taylor, Johan A. du Preez

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Additionally, to perform more fine grained evaluations and comparisons, we use simulations that enable comparisons with the ground truth via Kullback-Leibler divergence (KLD). Using coherence measures for the text corpora and KLD with the simulations we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.