Johan du Preez

LG
8papers
67citations
Novelty54%
AI Score26

8 Papers

ASJun 23, 2022
A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery

Werner van der Merwe, Herman Kamper, Johan du Preez

Latent Dirichlet allocation (LDA) is widely used for unsupervised topic modelling on sets of documents. No temporal information is used in the model. However, there is often a relationship between the corresponding topics of consecutive tokens. In this paper, we present an extension to LDA that uses a Markov chain to model temporal information. We use this new model for acoustic unit discovery from speech. As input tokens, the model takes a discretised encoding of speech from a vector quantised (VQ) neural network with 512 codes. The goal is then to map these 512 VQ codes to 50 phone-like units (topics) in order to more closely resemble true phones. In contrast to the base LDA, which only considers how VQ codes co-occur within utterances (documents), the Markov chain LDA additionally captures how consecutive codes follow one another. This extension leads to an increase in cluster quality and phone segmentation results compared to the base LDA. Compared to a recent vector quantised neural network approach that also learns 50 units, the extended LDA model performs better in phone segmentation but worse in mutual information.

CVMar 21, 2023
On the link between generative semi-supervised learning and generative open-set recognition

Emile Reyn Engelbrecht, Johan du Preez

This study investigates the relationship between semi-supervised learning (SSL, which is training off partially labelled datasets) and open-set recognition (OSR, which is classification with simultaneous novelty detection) under the context of generative adversarial networks (GANs). Although no previous study has formally linked SSL and OSR, their respective methods share striking similarities. Specifically, SSL-GANs and OSR-GANs require their generators to produce 'bad-looking' samples which are used to regularise their classifier networks. We hypothesise that the definitions of bad-looking samples in SSL and OSR represents the same concept and realises the same goal. More formally, bad-looking samples lie in the complementary space, which is the area between and around the boundaries of the labelled categories within the classifier's embedding space. By regularising a classifier with samples in the complementary space, classifiers achieve improved generalisation for SSL and also generalise the open space for OSR. To test this hypothesis, we compare a foundational SSL-GAN with the state-of-the-art OSR-GAN under the same SSL-OSR experimental conditions. Our results find that SSL-GANs achieve near identical results to OSR-GANs, proving the SSL-OSR link. Subsequently, to further this new research path, we compare several SSL-GANs various SSL-OSR setups which this first benchmark results. A combined framework of SSL-OSR certainly improves the practicality and cost-efficiency of classifier training, and so further theoretical and application studies are also discussed.

SDOct 19, 2021
Temporal separation of whale vocalizations from background oceanic noise using a power calculation

Jacques van Wyk, Jaco Versfeld, Johan du Preez

The process of analyzing audio signals in search of cetacean vocalizations is in many cases a very arduous task, requiring many complex computations, a plethora of digital processing techniques and the scrutinization of an audio signal with a fine comb to determine where the vocalizations are located. To ease this process, a computationally efficient and noise-resistant method for determining whether an audio segment contains a potential cetacean call is developed here with the help of a robust power calculation for stationary Gaussian noise signals and a recursive method for determining the mean and variance of a given sample frame. The resulting detector is tested on audio recordings containing southern right whale sounds and its performance is compared to a contemporary energy detector and a popular deep learning method. The detector exhibits good performance at moderate-to-high signal-to-noise ratio values. The detector succeeds in being easy to implement, computationally efficient to use and robust enough to accurately detect whale vocalizations in a noisy underwater environment.

CVOct 7, 2021
A Probabilistic Graphical Model Approach to the Structure-and-Motion Problem

Simon Streicher, Willie Brink, Johan du Preez

We present a means of formulating and solving the well known structure-and-motion problem in computer vision with probabilistic graphical models. We model the unknown camera poses and 3D feature coordinates as well as the observed 2D projections as Gaussian random variables, using sigma point parameterizations to effectively linearize the nonlinear relationships between these variables. Those variables involved in every projection are grouped into a cluster, and we connect the clusters in a cluster graph. Loopy belief propagation is performed over this graph, in an iterative re-initialization and estimation procedure, and we find that our approach shows promise in both simulation and on real-world data. The PGM is easily extendable to include additional parameters or constraints.

LGOct 5, 2021
Graph Coloring: Comparing Cluster Graphs to Factor Graphs

Simon Streicher, Johan du Preez

We present a means of formulating and solving graph coloring problems with probabilistic graphical models. In contrast to the prevalent literature that uses factor graphs for this purpose, we instead approach it from a cluster graph perspective. Since there seems to be a lack of algorithms to automatically construct valid cluster graphs, we provide such an algorithm (termed LTRIP). Our experiments indicate a significant advantage for preferring cluster graphs over factor graphs, both in terms of accuracy as well as computational efficiency.

LGSep 30, 2021
Strengthening Probabilistic Graphical Models: The Purge-and-merge Algorithm

Simon Streicher, Johan du Preez

Probabilistic graphical models (PGMs) are powerful tools for solving systems of complex relationships over a variety of probability distributions. However, while tree-structured PGMs always result in efficient and exact solutions, inference on graph (or loopy) structured PGMs is not guaranteed to discover the optimal solutions. It is in principle possible to convert loopy PGMs to an equivalent tree structure, but this is usually impractical for interesting problems due to exponential blow-up. To address this, we developed the purge-and-merge algorithm. This algorithm iteratively nudges a malleable graph structure towards a tree structure by selectively merging factors. The merging process is designed to avoid exponential blow-up by way of sparse structures from which redundancy is purged as the algorithm progresses. We set up tasks to test the algorithm on constraint-satisfaction puzzles such as Sudoku, Fill-a-pix, and Kakuro, and it outperformed other PGM-based approaches reported in the literature. While the tasks we set focussed on the binary logic of CSP, we believe the purge-and-merge algorithm could be extended to general PGM inference.

LGOct 23, 2019
Stabilising priors for robust Bayesian deep learning

Felix McGregor, Arnu Pretorius, Johan du Preez et al.

Bayesian neural networks (BNNs) have developed into useful tools for probabilistic modelling due to recent advances in variational inference enabling large scale BNNs. However, BNNs remain brittle and hard to train, especially: (1) when using deep architectures consisting of many hidden layers and (2) in situations with large weight variances. We use signal propagation theory to quantify these challenges and propose self-stabilising priors. This is achieved by a reformulation of the ELBO to allow the prior to influence network signal propagation. Then, we develop a stabilising prior, where the distributional parameters of the prior are adjusted before each forward pass to ensure stability of the propagating signal. This stabilised signal propagation leads to improved convergence and robustness making it possible to train deeper networks and in more noisy settings.

APApr 8, 2013
The PAV algorithm optimizes binary proper scoring rules

Niko Brummer, Johan du Preez

There has been much recent interest in application of the pool-adjacent-violators (PAV) algorithm for the purpose of calibrating the probabilistic outputs of automatic pattern recognition and machine learning algorithms. Special cost functions, known as proper scoring rules form natural objective functions to judge the goodness of such calibration. We show that for binary pattern classifiers, the non-parametric optimization of calibration, subject to a monotonicity constraint, can be solved by PAV and that this solution is optimal for all regular binary proper scoring rules. This extends previous results which were limited to convex binary proper scoring rules. We further show that this result holds not only for calibration of probabilities, but also for calibration of log-likelihood-ratios, in which case optimality holds independently of the prior probabilities of the pattern classes.