CVSep 4, 2024Code
One Homography is All You Need: IMM-based Joint Homography and Multiple Object State EstimationPaul Johannes Claasen, Johan Pieter de Villiers
A novel online MOT algorithm, IMM Joint Homography State Estimation (IMM-JHSE), is proposed. IMM-JHSE uses an initial homography estimate as the only additional 3D information, whereas other 3D MOT methods use regular 3D measurements. By jointly modelling the homography matrix and its dynamics as part of track state vectors, IMM-JHSE removes the explicit influence of camera motion compensation techniques on predicted track position states, which was prevalent in previous approaches. Expanding upon this, static and dynamic camera motion models are combined using an IMM filter. A simple bounding box motion model is used to predict bounding box positions to incorporate image plane information. In addition to applying an IMM to camera motion, a non-standard IMM approach is applied where bounding-box-based BIoU scores are mixed with ground-plane-based Mahalanobis distances in an IMM-like fashion to perform association only, making IMM-JHSE robust to motion away from the ground plane. Finally, IMM-JHSE makes use of dynamic process and measurement noise estimation techniques. IMM-JHSE improves upon related techniques, including UCMCTrack, OC-SORT, C-BIoU and ByteTrack on the DanceTrack and KITTI-car datasets, increasing HOTA by 2.64 and 2.11, respectively, while offering competitive performance on the MOT17, MOT20 and KITTI-pedestrian datasets. Using publicly available detections, IMM-JHSE outperforms almost all other 2D MOT methods and is outperformed only by 3D MOT methods -- some of which are offline -- on the KITTI-car dataset. Compared to tracking-by-attention methods, IMM-JHSE shows remarkably similar performance on the DanceTrack dataset and outperforms them on the MOT17 dataset. The code is publicly available: https://github.com/Paulkie99/imm-jhse.
LGNov 1, 2022
Comparision Of Adversarial And Non-Adversarial LSTM Music Generative ModelsMoseli Mots'oehli, Anna Sergeevna Bosman, Johan Pieter De Villiers
Algorithmic music composition is a way of composing musical pieces with minimal to no human intervention. While recurrent neural networks are traditionally applied to many sequence-to-sequence prediction tasks, including successful implementations of music composition, their standard supervised learning approach based on input-to-output mapping leads to a lack of note variety. These models can therefore be seen as potentially unsuitable for tasks such as music generation. Generative adversarial networks learn the generative distribution of data and lead to varied samples. This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data. The resulting music samples are evaluated by human listeners, their preferences recorded. The evaluation indicates that adversarial training produces more aesthetically pleasing music.
LGFeb 27, 2025
Online Meta-learning for AutoML in Real-time (OnMAR)Mia Gerber, Anna Sergeevna Bosman, Johan Pieter de Villiers
Automated machine learning (AutoML) is a research area focusing on using optimisation techniques to design machine learning (ML) algorithms, alleviating the need for a human to perform manual algorithm design. Real-time AutoML enables the design process to happen while the ML algorithm is being applied to a task. Real-time AutoML is an emerging research area, as such existing real-time AutoML techniques need improvement with respect to the quality of designs and time taken to create designs. To address these issues, this study proposes an Online Meta-learning for AutoML in Real-time (OnMAR) approach. Meta-learning gathers information about the optimisation process undertaken by the ML algorithm in the form of meta-features. Meta-features are used in conjunction with a meta-learner to optimise the optimisation process. The OnMAR approach uses a meta-learner to predict the accuracy of an ML design. If the accuracy predicted by the meta-learner is sufficient, the design is used, and if the predicted accuracy is low, an optimisation technique creates a new design. A genetic algorithm (GA) is the optimisation technique used as part of the OnMAR approach. Different meta-learners (k-nearest neighbours, random forest and XGBoost) are tested. The OnMAR approach is model-agnostic (i.e. not specific to a single real-time AutoML application) and therefore evaluated on three different real-time AutoML applications, namely: composing an image clustering algorithm, configuring the hyper-parameters of a convolutional neural network, and configuring a video classification pipeline. The OnMAR approach is effective, matching or outperforming existing real-time AutoML approaches, with the added benefit of a faster runtime.
IRNov 21, 2020
The complementarity of a diverse range of deep learning features extracted from video content for video recommendationAdolfo Almeida, Johan Pieter de Villiers, Allan De Freitas et al.
Following the popularisation of media streaming, a number of video streaming services are continuously buying new video content to mine the potential profit from them. As such, the newly added content has to be handled well to be recommended to suitable users. In this paper, we address the new item cold-start problem by exploring the potential of various deep learning features to provide video recommendations. The deep learning features investigated include features that capture the visual-appearance, audio and motion information from video content. We also explore different fusion methods to evaluate how well these feature modalities can be combined to fully exploit the complementary information captured by them. Experiments on a real-world video dataset for movie recommendations show that deep learning features outperform hand-crafted features. In particular, recommendations generated with deep learning audio features and action-centric deep learning features are superior to MFCC and state-of-the-art iDT features. In addition, the combination of various deep learning features with hand-crafted features and textual metadata yields significant improvement in recommendations compared to combining only the former.
LGFeb 26, 2020
Deep Learning and Statistical Models for Time-Critical Pedestrian Behaviour PredictionJoel Janek Dabrowski, Johan Pieter de Villiers, Ashfaqur Rahman et al.
The time it takes for a classifier to make an accurate prediction can be crucial in many behaviour recognition problems. For example, an autonomous vehicle should detect hazardous pedestrian behaviour early enough for it to take appropriate measures. In this context, we compare the switching linear dynamical system (SLDS) and a three-layered bi-directional long short-term memory (LSTM) neural network, which are applied to infer pedestrian behaviour from motion tracks. We show that, though the neural network model achieves an accuracy of 80%, it requires long sequences to achieve this (100 samples or more). The SLDS, has a lower accuracy of 74%, but it achieves this result with short sequences (10 samples). To our knowledge, such a comparison on sequence length has not been considered in the literature before. The results provide a key intuition of the suitability of the models in time-critical problems.