Terence L. van Zyl

h-index15

29papers

181citations

Novelty39%

AI Score47

Ranked #32,990 of 194,257 authors (top 17%)#7,784 in LG (top 19%)

29 Papers

5.8LGApr 26, 2022Code

Using Machine Learning to Fuse Verbal Autopsy Narratives and Binary Features in the Analysis of Deaths from Hyperglycaemia

Thokozile Manaka, Terence Van Zyl, Alisha N Wade et al.

Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy(VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of an open-ended narrative text. This study assesses the performance of various machine learning approaches when analyzing both the structured and unstructured components of the VA report. The algorithms were trained and tested via cross-validation in the three settings of binary features, text features and a combination of binary and text features derived from VA reports from rural South Africa. The results obtained indicate narrative text features contain valuable information for determining COD and that a combination of binary and text features improves the automated COD classification task. Keywords: Diabetes Mellitus, Verbal Autopsy, Cause of Death, Machine Learning, Natural Language Processing

0.6CLOct 31, 2022

Improving Cause-of-Death Classification from Verbal Autopsy Reports

Thokozile Manaka, Terence van Zyl, Deepak Kar

In many lower-and-middle income countries including South Africa, data access in health facilities is restricted due to patient privacy and confidentiality policies. Further, since clinical data is unique to individual institutions and laboratories, there are insufficient data annotation standards and conventions. As a result of the scarcity of textual data, natural language processing (NLP) techniques have fared poorly in the health sector. A cause of death (COD) is often determined by a verbal autopsy (VA) report in places without reliable death registration systems. A non-clinician field worker does a VA report using a set of standardized questions as a guide to uncover symptoms of a COD. This analysis focuses on the textual part of the VA report as a case study to address the challenge of adapting NLP techniques in the health domain. We present a system that relies on two transfer learning paradigms of monolingual learning and multi-source domain adaptation to improve VA narratives for the target task of the COD classification. We use the Bidirectional Encoder Representations from Transformers (BERT) and Embeddings from Language Models (ELMo) models pre-trained on the general English and health domains to extract features from the VA narratives. Our findings suggest that this transfer learning system improves the COD classification tasks and that the narrative text contains valuable information for figuring out a COD. Our results further show that combining binary VA features and narrative text features learned via this framework boosts the classification task of COD.

5.8LGMar 7, 2022

Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion

Pieter Cawood, Terence van Zyl

Techniques of hybridisation and ensemble learning are popular model fusion techniques for improving the predictive power of forecasting methods. With limited research that instigates combining these two promising approaches, this paper focuses on the utility of the Exponential-Smoothing-Recurrent Neural Network (ES-RNN) in the pool of base models for different ensembles. We compare against some state of the art ensembling techniques and arithmetic model averaging as a benchmark. We experiment with the M4 forecasting data set of 100,000 time-series, and the results show that the Feature-based Forecast Model Averaging (FFORMA), on average, is the best technique for late data fusion with the ES-RNN. However, considering the M4's Daily subset of data, stacking was the only successful ensemble at dealing with the case where all base model performances are similar. Our experimental results indicate that we attain state of the art forecasting results compared to N-BEATS as a benchmark. We conclude that model averaging is a more robust ensemble than model selection and stacking strategies. Further, the results show that gradient boosting is superior for implementing ensemble learning strategies.

6.6NEMar 28, 2022

Surrogate Assisted Evolutionary Multi-objective Optimisation applied to a Pressure Swing Adsorption system

Liezl Stander, Matthew Woolway, Terence L. Van Zyl

Chemical plant design and optimisation have proven challenging due to the complexity of these real-world systems. The resulting complexity translates into high computational costs for these systems' mathematical formulations and simulation models. Research has illustrated the benefits of using machine learning surrogate models as substitutes for computationally expensive models during optimisation. This paper extends recent research into optimising chemical plant design and operation. The study further explores Surrogate Assisted Genetic Algorithms (SA-GA) in more complex variants of the original plant design and optimisation problems, such as the inclusion of parallel and feedback components. The novel extension to the original algorithm proposed in this study, Surrogate Assisted NSGA-\Romannum{2} (SA-NSGA), was tested on a popular literature case, the Pressure Swing Adsorption (PSA) system. We further provide extensive experimentation, comparing various meta-heuristic optimisation techniques and numerous machine learning models as surrogates. The results for both sets of systems illustrate the benefits of using Genetic Algorithms as an optimisation framework for complex chemical plant system design and optimisation for both single and multi-objective scenarios. We confirm that Random Forest surrogate assisted Evolutionary Algorithms can be scaled to increasingly complex chemical systems with parallel and feedback components. We further find that combining a Genetic Algorithm framework with Machine Learning Surrogate models as a substitute for long-running simulation models yields significant computational efficiency improvements, 1.7 - 1.84 times speedup for the increased complexity examples and a 2.7 times speedup for the Pressure Swing Adsorption system.

3.3PMMar 10, 2022

Fusion of Sentiment and Asset Price Predictions for Portfolio Optimization

Mufhumudzi Muthivhi, Terence L. van Zyl

The fusion of public sentiment data in the form of text with stock price prediction is a topic of increasing interest within the financial community. However, the research literature seldom explores the application of investor sentiment in the Portfolio Selection problem. This paper aims to unpack and develop an enhanced understanding of the sentiment aware portfolio selection problem. To this end, the study uses a Semantic Attention Model to predict sentiment towards an asset. We select the optimal portfolio through a sentiment-aware Long Short Term Memory (LSTM) recurrent neural network for price prediction and a mean-variance strategy. Our sentiment portfolio strategies achieved on average a significant increase in revenue above the non-sentiment aware models. However, the results show that our strategy does not outperform traditional portfolio allocation strategies from a stability perspective. We argue that an improved fusion of sentiment prediction with a combination of price prediction and portfolio optimization leads to an enhanced portfolio selection strategy.

2.0LGApr 13, 2023Code

A Learnheuristic Approach to A Constrained Multi-Objective Portfolio Optimisation Problem

Sonia Bullah, Terence L. van Zyl

Multi-objective portfolio optimisation is a critical problem researched across various fields of study as it achieves the objective of maximising the expected return while minimising the risk of a given portfolio at the same time. However, many studies fail to include realistic constraints in the model, which limits practical trading strategies. This study introduces realistic constraints, such as transaction and holding costs, into an optimisation model. Due to the non-convex nature of this problem, metaheuristic algorithms, such as NSGA-II, R-NSGA-II, NSGA-III and U-NSGA-III, will play a vital role in solving the problem. Furthermore, a learnheuristic approach is taken as surrogate models enhance the metaheuristics employed. These algorithms are then compared to the baseline metaheuristic algorithms, which solve a constrained, multi-objective optimisation problem without using learnheuristics. The results of this study show that, despite taking significantly longer to run to completion, the learnheuristic algorithms outperform the baseline algorithms in terms of hypervolume and rate of convergence. Furthermore, the backtesting results indicate that utilising learnheuristics to generate weights for asset allocation leads to a lower risk percentage, higher expected return and higher Sharpe ratio than backtesting without using learnheuristics. This leads us to conclude that using learnheuristics to solve a constrained, multi-objective portfolio optimisation problem produces superior and preferable results than solving the problem without using learnheuristics.

3.7IROct 13, 2022

Multi-Modal Recommendation System with Auxiliary Information

Mufhumudzi Muthivhi, Terence L. van Zyl, Hairong Wang

Context-aware recommendation systems improve upon classical recommender systems by including, in the modelling, a user's behaviour. Research into context-aware recommendation systems has previously only considered the sequential ordering of items as contextual information. However, there is a wealth of unexploited additional multi-modal information available in auxiliary knowledge related to items. This study extends the existing research by evaluating a multi-modal recommendation system that exploits the inclusion of comprehensive auxiliary knowledge related to an item. The empirical results explore extracting vector representations (embeddings) from unstructured and structured data using data2vec. The fused embeddings are then used to train several state-of-the-art transformer architectures for sequential user-item representations. The analysis of the experimental results shows a statistically significant improvement in prediction accuracy, which confirms the effectiveness of including auxiliary information in a context-aware recommendation system. We report a 4% and 11% increase in the NDCG score for long and short user sequence datasets, respectively.

2.0LGMar 20, 2023

Late Meta-learning Fusion Using Representation Learning for Time Series Forecasting

Terence L. van Zyl

Meta-learning, decision fusion, hybrid models, and representation learning are topics of investigation with significant traction in time-series forecasting research. Of these two specific areas have shown state-of-the-art results in forecasting: hybrid meta-learning models such as Exponential Smoothing - Recurrent Neural Network (ES-RNN) and Neural Basis Expansion Analysis (N-BEATS) and feature-based stacking ensembles such as Feature-based FORecast Model Averaging (FFORMA). However, a unified taxonomy for model fusion and an empirical comparison of these hybrid and feature-based stacking ensemble approaches is still missing. This study presents a unified taxonomy encompassing these topic areas. Furthermore, the study empirically evaluates several model fusion approaches and a novel combination of hybrid and feature stacking algorithms called Deep-learning FORecast Model Averaging (DeFORMA). The taxonomy contextualises the considered methods. Furthermore, the empirical analysis of the results shows that the proposed model, DeFORMA, can achieve state-of-the-art results in the M4 data set. DeFORMA, increases the mean Overall Weighted Average (OWA) in the daily, weekly and yearly subsets with competitive results in the hourly, monthly and quarterly subsets. The taxonomy and empirical results lead us to argue that significant progress is still to be made by continuing to explore the intersection of these research areas.

4.4CVMay 15

Multi-Object Tracking Consistently Improves Wildlife Inference

Mufhumudzi Muthivhi, Jiahao Huo, Fredrik Gustafsson et al.

Camera traps have become a common tool for wildlife monitoring efforts in ecological research and biodiversity conservation. Wildlife classification models have benefited from the increase in wildlife visual data. These models reach high levels of accuracy on curated, high-quality datasets. However, their performance remains sensitive to real-world environmental constraints. They often produce inconsistent predictions when performing inference on temporally coherent sequences. The predicted label for a single individual shifts rapidly between frames. This study exploits the temporal nature of camera-trap data to augment inferred predictions from a wildlife classification model. Specifically, we adopt several standard Multi-Object Tracking (MOT) models to link detections across consecutive frames. The curated trajectories are used to fuse the softmax class probabilities. The fused probability score produces a single consensus class label estimate that overrides misclassifications caused by noise. The analysis of the experimental results shows that our proposed strategy improves over a standalone classifier over all datasets and for each metric. Specifically, the best-performing MOT models gain a weighted F1-Score of 5.1%, 3.1% and 2.0% over the classifier across three MOT datasets.

1.8LGNov 5, 2022

Towards a methodology for addressing missingness in datasets, with an application to demographic health datasets

Gift Khangamwa, Terence L. van Zyl, Clint J. van Alten

Missing data is a common concern in health datasets, and its impact on good decision-making processes is well documented. Our study's contribution is a methodology for tackling missing data problems using a combination of synthetic dataset generation, missing data imputation and deep learning methods to resolve missing data challenges. Specifically, we conducted a series of experiments with these objectives; $a)$ generating a realistic synthetic dataset, $b)$ simulating data missingness, $c)$ recovering the missing data, and $d)$ analyzing imputation performance. Our methodology used a gaussian mixture model whose parameters were learned from a cleaned subset of a real demographic and health dataset to generate the synthetic data. We simulated various missingness degrees ranging from $10 \%$, $20 \%$, $30 \%$, and $40\%$ under the missing completely at random scheme MCAR. We used an integrated performance analysis framework involving clustering, classification and direct imputation analysis. Our results show that models trained on synthetic and imputed datasets could make predictions with an accuracy of $83 \%$ and $80 \%$ on $a) $ an unseen real dataset and $b)$ an unseen reserved synthetic test dataset, respectively. Moreover, the models that used the DAE method for imputed yielded the lowest log loss an indication of good performance, even though the accuracy measures were slightly lower. In conclusion, our work demonstrates that using our methodology, one can reverse engineer a solution to resolve missingness on an unseen dataset with missingness. Moreover, though we used a health dataset, our methodology can be utilized in other contexts.

2.7NEOct 31, 2022

Exploring the effectiveness of surrogate-assisted evolutionary algorithms on the batch processing problem

Mohamed Z. Variawa, Terence L. Van Zyl, Matthew Woolway

Real-world optimisation problems typically have objective functions which cannot be expressed analytically. These optimisation problems are evaluated through expensive physical experiments or simulations. Cheap approximations of the objective function can reduce the computational requirements for solving these expensive optimisation problems. These cheap approximations may be machine learning or statistical models and are known as surrogate models. This paper introduces a simulation of a well-known batch processing problem in the literature. Evolutionary algorithms such as Genetic Algorithm (GA), Differential Evolution (DE) are used to find the optimal schedule for the simulation. We then compare the quality of solutions obtained by the surrogate-assisted versions of the algorithms against the baseline algorithms. Surrogate-assistance is achieved through Probablistic Surrogate-Assisted Framework (PSAF). The results highlight the potential for improving baseline evolutionary algorithms through surrogates. For different time horizons, the solutions are evaluated with respect to several quality indicators. It is shown that the PSAF assisted GA (PSAF-GA) and PSAF-assisted DE (PSAF-DE) provided improvement in some time horizons. In others, they either maintained the solutions or showed some deterioration. The results also highlight the need to tune the hyper-parameters used by the surrogate-assisted framework, as the surrogate, in some instances, shows some deterioration over the baseline algorithm.

2.7NESep 13, 2022Code

Pareto Driven Surrogate (ParDen-Sur) Assisted Optimisation of Multi-period Portfolio Backtest Simulations

Terence L. van Zyl, Matthew Woolway, Andrew Paskaramoorthy

Portfolio management is a multi-period multi-objective optimisation problem subject to a wide range of constraints. However, in practice, portfolio management is treated as a single-period problem partly due to the computationally burdensome hyper-parameter search procedure needed to construct a multi-period Pareto frontier. This study presents the \gls{ParDen-Sur} modelling framework to efficiently perform the required hyper-parameter search. \gls{ParDen-Sur} extends previous surrogate frameworks by including a reservoir sampling-based look-ahead mechanism for offspring generation in \glspl{EA} alongside the traditional acceptance sampling scheme. We evaluate this framework against, and in conjunction with, several seminal \gls{MO} \glspl{EA} on two datasets for both the single- and multi-period use cases. Our results show that \gls{ParDen-Sur} can speed up the exploration for optimal hyper-parameters by almost $2\times$ with a statistically significant improvement of the Pareto frontiers, across multiple \glspl{EA}, for both datasets and use cases.

0.3CLJun 21, 2022

Knowledge Graph Fusion for Language Model Fine-tuning

Nimesh Bhana, Terence L. van Zyl

Language Models such as BERT have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques, they can produce semantic representations of text, useful for tasks such as semantic similarity. However, state-of-the-art models often have high computational requirements and lack global context or domain knowledge which is required for complete language understanding. To address these limitations, we investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT. An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language and extended to inject contextually relevant information into sentences. As a side-effect, changes made to K-BERT for accommodating the English language also extend to other word-based languages. Experiments conducted indicate that injected knowledge introduces noise. We see statistically significant improvements for knowledge-driven tasks when this noise is minimised. We show evidence that, given the appropriate task, modest injection with relevant, high-quality knowledge is most performant.

6.2CVOct 20, 2025Code

Nearest-Class Mean and Logits Agreement for Wildlife Open-Set Recognition

Jiahao Huo, Mufhumudzi Muthivhi, Terence L. van Zyl et al.

Current state-of-the-art Wildlife classification models are trained under the closed world setting. When exposed to unknown classes, they remain overconfident in their predictions. Open-set Recognition (OSR) aims to classify known classes while rejecting unknown samples. Several OSR methods have been proposed to model the closed-set distribution by observing the feature, logit, or softmax probability space. A significant drawback of many existing approaches is the requirement to retrain the pre-trained classification model with the OSR-specific strategy. This study contributes a post-processing OSR method that measures the agreement between the models' features and predicted logits. We propose a probability distribution based on an input's distance to its Nearest Class Mean (NCM). The NCM-based distribution is then compared with the softmax probabilities from the logit space to measure agreement between the NCM and the classification head. Our proposed strategy ranks within the top three on two evaluated datasets, showing consistent performance across the two datasets. In contrast, current state-of-the-art methods excel on a single dataset. We achieve an AUROC of 93.41 and 95.35 for African and Swedish animals. The code can be found https://github.com/Applied-Representation-Learning-Lab/OSR.

3.6IRDec 15, 2025

BiCoRec: Bias-Mitigated Context-Aware Sequential Recommendation Model

Mufhumudzi Muthivhi, Terence L van Zyl, Hairong Wang

Sequential recommendation models aim to learn from users evolving preferences. However, current state-of-the-art models suffer from an inherent popularity bias. This study developed a novel framework, BiCoRec, that adaptively accommodates users changing preferences for popular and niche items. Our approach leverages a co-attention mechanism to obtain a popularity-weighted user sequence representation, facilitating more accurate predictions. We then present a new training scheme that learns from future preferences using a consistency loss function. BiCoRec aimed to improve the recommendation performance of users who preferred niche items. For these users, BiCoRec achieves a 26.00% average improvement in NDCG@10 over state-of-the-art baselines. When ranking the relevant item against the entire collection, BiCoRec achieves NDCG@10 scores of 0.0102, 0.0047, 0.0021, and 0.0005 for the Movies, Fashion, Games and Music datasets.

1.5CVNov 10, 2023

Comparing Male Nyala and Male Kudu Classification using Transfer Learning with ResNet-50 and VGG-16

T. T Lemani, T. L. van Zyl

Reliable and efficient monitoring of wild animals is crucial to inform management and conservation decisions. The process of manually identifying species of animals is time-consuming, monotonous, and expensive. Leveraging on advances in deep learning and computer vision, we investigate in this paper the efficiency of pre-trained models, specifically the VGG-16 and ResNet-50 model, in identifying a male Kudu and a male Nyala in their natural habitats. These pre-trained models have proven to be efficient in animal identification in general. Still, there is little research on animals like the Kudu and Nyala, who are usually well camouflaged and have similar features. The method of transfer learning used in this paper is the fine-tuning method. The models are evaluated before and after fine-tuning. The experimental results achieved an accuracy of 93.2\% and 97.7\% for the VGG-16 and ResNet-50 models, respectively, before fine-tuning and 97.7\% for both models after fine-tuning. Although these results are impressive, it should be noted that they were taken over a small sample size of 550 images split in half between the two classes; therefore, this might not cater to enough scenarios to get a full conclusion of the efficiency of the models. Therefore, there is room for more work in getting a more extensive dataset and testing and extending to the female counterparts of these species and the whole antelope species.

1.2PMMay 21, 2023Code

Machine Learning for Socially Responsible Portfolio Optimisation

Taeisha Nundlall, Terence L Van Zyl

Socially responsible investors build investment portfolios intending to incite social and environmental advancement alongside a financial return. Although Mean-Variance (MV) models successfully generate the highest possible return based on an investor's risk tolerance, MV models do not make provisions for additional constraints relevant to socially responsible (SR) investors. In response to this problem, the MV model must consider Environmental, Social, and Governance (ESG) scores in optimisation. Based on the prominent MV model, this study implements portfolio optimisation for socially responsible investors. The amended MV model allows SR investors to enter markets with competitive SR portfolios despite facing a trade-off between their investment Sharpe Ratio and the average ESG score of the portfolio.

1.8LGFeb 25, 2022

Statistics and Deep Learning-based Hybrid Model for Interpretable Anomaly Detection

Thabang Mathonsi, Terence L van Zyl

Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at both forecasting tasks, and at quantifying the uncertainty associated with those forecasts (prediction intervals). One example is Multivariate Exponential Smoothing Long Short-Term Memory (MES-LSTM), a hybrid between a multivariate statistical forecasting model and a Recurrent Neural Network variant, Long Short-Term Memory. It has also been shown that a model that ($i$) produces accurate forecasts and ($ii$) is able to quantify the associated predictive uncertainty satisfactorily, can be successfully adapted to a model suitable for anomaly detection tasks. With the increasing ubiquity of multivariate data and new application domains, there have been numerous anomaly detection methods proposed in recent years. The proposed methods have largely focused on deep learning techniques, which are prone to suffer from challenges such as ($i$) large sets of parameters that may be computationally intensive to tune, $(ii)$ returning too many false positives rendering the techniques impractical for use, $(iii)$ requiring labeled datasets for training which are often not prevalent in real life, and ($iv$) understanding of the root causes of anomaly occurrences inhibited by the predominantly black-box nature of deep learning methods. In this article, an extension of MES-LSTM is presented, an interpretable anomaly detection model that overcomes these challenges. With a focus on renewable energy generation as an application domain, the proposed approach is benchmarked against the state-of-the-art. The findings are that MES-LSTM anomaly detector is at least competitive to the benchmarks at anomaly detection tasks, and less prone to learning from spurious effects than the benchmarks, thus making it more reliable at root cause discovery and explanation.

1.2PMFeb 13, 2022

Deep Reinforcement Learning and Convex Mean-Variance Optimisation for Portfolio Management

Ruan Pretorius, Terence van Zyl

Traditional portfolio management methods can incorporate specific investor preferences but rely on accurate forecasts of asset returns and covariances. Reinforcement learning (RL) methods do not rely on these explicit forecasts and are better suited for multi-stage decision processes. To address limitations of the evaluated research, experiments were conducted on three markets in different economies with different overall trends. By incorporating specific investor preferences into our RL models' reward functions, a more comprehensive comparison could be made to traditional methods in risk-return space. Transaction costs were also modelled more realistically by including nonlinear changes introduced by market volatility and trading volume. The results of this study suggest that there can be an advantage to using RL methods compared to traditional convex mean-variance optimisation methods under certain market conditions. Our RL models could significantly outperform traditional single-period optimisation (SPO) and multi-period optimisation (MPO) models in upward trending markets, but only up to specific risk limits. In sideways trending markets, the performance of SPO and MPO models can be closely matched by our RL models for the majority of the excess risk range tested. The specific market conditions under which these models could outperform each other highlight the importance of a more comprehensive comparison of Pareto optimal frontiers in risk-return space. These frontiers give investors a more granular view of which models might provide better performance for their specific risk tolerance or return targets.

5.8LGFeb 1, 2022

Investigating Transfer Learning in Graph Neural Networks

Nishai Kooverjee, Steven James, Terence van Zyl

Graph neural networks (GNNs) build on the success of deep learning models by extending them for use in graph spaces. Transfer learning has proven extremely successful for traditional deep learning problems: resulting in faster training and improved performance. Despite the increasing interest in GNNs and their use cases, there is little research on their transferability. This research demonstrates that transfer learning is effective with GNNs, and describes how source tasks and the choice of GNN impact the ability to learn generalisable knowledge. We perform experiments using real-world and synthetic data within the contexts of node classification and graph classification. To this end, we also provide a general methodology for transfer learning experimentation and present a novel algorithm for generating synthetic graph classification tasks. We compare the performance of GCN, GraphSAGE and GIN across both the synthetic and real-world datasets. Our results demonstrate empirically that GNNs with inductive operations yield statistically significantly improved transfer. Further we show that similarity in community structure between source and target tasks support statistically significant improvements in transfer over and above the use of only the node attributes.

6.5LGDec 16, 2021Code

A Statistics and Deep Learning Hybrid Method for Multivariate Time Series Forecasting and Mortality Modeling

Thabang Mathonsi, Terence L. van Zyl

Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at forecasting tasks and quantifying the associated uncertainty with those forecasts (prediction intervals). One example is Exponential Smoothing Recurrent Neural Network (ES-RNN), a hybrid between a statistical forecasting model and a recurrent neural network variant. ES-RNN achieves a 9.4\% improvement in absolute error in the Makridakis-4 Forecasting Competition. This improvement and similar outperformance from other hybrid models have primarily been demonstrated only on univariate datasets. Difficulties with applying hybrid forecast methods to multivariate data include ($i$) the high computational cost involved in hyperparameter tuning for models that are not parsimonious, ($ii$) challenges associated with auto-correlation inherent in the data, as well as ($iii$) complex dependency (cross-correlation) between the covariates that may be hard to capture. This paper presents Multivariate Exponential Smoothing Long Short Term Memory (MES-LSTM), a generalized multivariate extension to ES-RNN, that overcomes these challenges. MES-LSTM utilizes a vectorized implementation. We test MES-LSTM on several aggregated coronavirus disease of 2019 (COVID-19) morbidity datasets and find our hybrid approach shows consistent, significant improvement over pure statistical and deep learning methods at forecast accuracy and prediction interval construction.

7.5LGOct 7, 2021

Multivariate Anomaly Detection based on Prediction Intervals Constructed using Deep Learning

Thabang Mathonsi, Terence L. van Zyl

It has been shown that deep learning models can under certain circumstances outperform traditional statistical methods at forecasting. Furthermore, various techniques have been developed for quantifying the forecast uncertainty (prediction intervals). In this paper, we utilize prediction intervals constructed with the aid of artificial neural networks to detect anomalies in the multivariate setting. Challenges with existing deep learning-based anomaly detection approaches include $(i)$ large sets of parameters that may be computationally intensive to tune, $(ii)$ returning too many false positives rendering the techniques impractical for use, $(iii)$ requiring labeled datasets for training which are often not prevalent in real life. Our approach overcomes these challenges. We benchmark our approach against the oft-preferred well-established statistical models. We focus on three deep learning architectures, namely, cascaded neural networks, reservoir computing and long short-term memory recurrent neural networks. Our finding is deep learning outperforms (or at the very least is competitive to) the latter.

1.6LGOct 4, 2021

Incremental Class Learning using Variational Autoencoders with Similarity Learning

Jiahao Huo, Terence L. van Zyl

Catastrophic forgetting in neural networks during incremental learning remains a challenging problem. Previous research investigated catastrophic forgetting in fully connected networks, with some earlier work exploring activation functions and learning algorithms. Applications of neural networks have been extended to include similarity learning. Understanding how similarity learning loss functions would be affected by catastrophic forgetting is of significant interest. Our research investigates catastrophic forgetting for four well-known similarity-based loss functions during incremental class learning. The loss functions are Angular, Contrastive, Center, and Triplet loss. Our results show that the catastrophic forgetting rate differs across loss functions on multiple datasets. The Angular loss was least affected, followed by Contrastive, Triplet loss, and Center loss with good mining techniques. We implemented three existing incremental learning techniques, iCaRL, EWC, and EBLL. We further proposed a novel technique using Variational Autoencoders (VAEs) to generate representation as exemplars passed through the network's intermediate layers. Our method outperformed three existing state-of-the-art techniques. We show that one does not require stored images (exemplars) for incremental learning with similarity learning. The generated representations from VAEs help preserve regions of the embedding space used by prior knowledge so that new knowledge does not ``overwrite'' it.

2.3CYSep 9, 2021

Surrogate Parameters Optimization for Data and Model Fusion of COVID-19 Time-series Data

Timilehin Ogundare, Terence Van Zyl

Our research focuses on developing a computational framework to simulate the transmission dynamics of COVID-19 pandemic. We examine the development of a system named ADRIANA for the simulation using South Africa as a case study. The design of the ADRIANA system interconnects three sub-models to establish a computational technique to advise policy regarding lockdown measures to reduce the transmission pattern of COVID-19 in South Africa. Additionally, the output of the ADRIANA can be used by healthcare administration to predict peak demand time for resources needed to treat infected individuals. ABM is suited for our research experiment, but to prevent the computational constraints of using ABM-based framework for this research, we develop an SEIR compartmental model, a discrete event simulator, and an optimized surrogate model to form a system named ADRIANA. We also ensure that the surrogate's findings are accurate enough to provide optimal solutions. We use the Genetic Algorithm (GA) for the optimization by estimating the optimal hyperparameter configuration for the surrogate. We concluded this study by discussing the solutions presented by the ADRIANA system, which aligns with the primary goal of our study to present an optimal guide to lockdown policy by the government and resource management by the hospital administrators.

4.4LGAug 19, 2021

Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model)

Rylan Perumal, Terence L van Zyl

Parameter calibration is a significant challenge in agent-based modelling and simulation (ABMS). An agent-based model's (ABM) complexity grows as the number of parameters required to be calibrated increases. This parameter expansion leads to the ABMS equivalent of the \say{curse of dimensionality}. In particular, infeasible computational requirements searching an infinite parameter space. We propose a more comprehensive and adaptive ABMS Framework that can effectively swap out parameterisation strategies and surrogate models to parameterise an infectious disease ABM. This framework allows us to evaluate different strategy-surrogate combinations' performance in accuracy and efficiency (speedup). We show that we achieve better than parity in accuracy across the surrogate assisted sampling strategies and the baselines. Also, we identify that the Metric Stochastic Response Surface strategy combined with the Support Vector Machine surrogate is the best overall in getting closest to the true synthetic parameters. Also, we show that DYnamic COOrdindate Search Using Response Surface Models with XGBoost as a surrogate attains in combination the highest probability of approximating a cumulative synthetic daily infection data distribution and achieves the most significant speedup with regards to our analysis. Lastly, we show in a real-world setting that DYCORS XGBoost and MSRS SVM can approximate the real world cumulative daily infection distribution with $97.12$\% and $96.75$\% similarity respectively.

3.1LGAug 19, 2021

Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves

Pieter Cawood, Terence L. van Zyl

We investigate ensembling techniques in forecasting and examine their potential for use in nonseasonal time-series similar to those in the early days of the COVID-19 pandemic. Developing improved forecast methods is essential as they provide data-driven decisions to organisations and decision-makers during critical phases. We propose using late data fusion, using a stacked ensemble of two forecasting models and two meta-features that prove their predictive power during a preliminary forecasting stage. The final ensembles include a Prophet and long short term memory (LSTM) neural network as base models. The base models are combined by a multilayer perceptron (MLP), taking into account meta-features that indicate the highest correlation with each base model's forecast accuracy. We further show that the inclusion of meta-features generally improves the ensemble's forecast accuracy across two forecast horizons of seven and fourteen days. This research reinforces previous work and demonstrates the value of combining traditional statistical models with deep learning models to produce more accurate forecast models for time-series from different domains and seasonality.

5.8LGAug 26, 2020

Surrogate Assisted Methods for the Parameterisation of Agent-Based Models

Rylan Perumal, Terence L van Zyl

Parameter calibration is a major challenge in agent-based modelling and simulation (ABMS). As the complexity of agent-based models (ABMs) increase, the number of parameters required to be calibrated grows. This leads to the ABMS equivalent of the \say{curse of dimensionality}. We propose an ABMS framework which facilitates the effective integration of different sampling methods and surrogate models (SMs) in order to evaluate how these strategies affect parameter calibration and exploration. We show that surrogate assisted methods perform better than the standard sampling methods. In addition, we show that the XGBoost and Decision Tree SMs are most optimal overall with regards to our analysis.

3.3CVJun 10, 2020

Unique Faces Recognition in Videos

Jiahao Huo, Terence L van Zyl

This paper tackles face recognition in videos employing metric learning methods and similarity ranking models. The paper compares the use of the Siamese network with contrastive loss and Triplet Network with triplet loss implementing the following architectures: Google/Inception architecture, 3D Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent Neural Network. We make use of still images and sequences from videos for training the networks and compare the performances implementing the above architectures. The dataset used was the YouTube Face Database designed for investigating the problem of face recognition in videos. The contribution of this paper is two-fold: to begin, the experiments have established 3-D Convolutional networks and 2-D LSTMs with the contrastive loss on image sequences do not outperform Google/Inception architecture with contrastive loss in top $n$ rank face retrievals with still images. However, the 3-D Convolution networks and 2-D LSTM with triplet Loss outperform the Google/Inception with triplet loss in top $n$ rank face retrievals on the dataset; second, a Support Vector Machine (SVM) was used in conjunction with the CNNs' learned feature representations for facial identification. The results show that feature representation learned with triplet loss is significantly better for n-shot facial identification compared to contrastive loss. The most useful feature representations for facial identification are from the 2-D LSTM with triplet loss. The experiments show that learning spatio-temporal features from video sequences is beneficial for facial recognition in videos.

3.3LGJan 2, 2020

Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep Character Recognition

Nishai Kooverjee, Steven James, Terence van Zyl

Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then retraining on a new one is called transfer learning. In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks. We perform three sets of experiments with varying levels of similarity between source and target tasks to investigate the behaviour of different types of knowledge transfer. We transfer both parameters and features and analyse their behaviour. Our results demonstrate that no significant advantage is gained by using a transfer learning approach over a traditional machine learning approach for our character recognition tasks. This suggests that using transfer learning does not necessarily presuppose a better performing model in all cases.