LGAug 19, 2023

Semi-Supervised Anomaly Detection for the Determination of Vehicle Hijacking Tweets

Taahir Aiyoob Patel, Clement N. Nyirenda

arXiv:2308.10036v1h-index: 8

Originality Synthesis-oriented

AI Analysis

This addresses the issue of real-time hijacking detection for public safety in South Africa, but it is incremental as it applies existing anomaly detection methods to a new domain.

The paper tackled the problem of detecting vehicle hijacking incidents from tweets in South Africa using semi-supervised anomaly detection, achieving up to 90% accuracy and an F1-score of 0.8 with the CBLOF method.

In South Africa, there is an ever-growing issue of vehicle hijackings. This leads to travellers constantly being in fear of becoming a victim to such an incident. This work presents a new semi-supervised approach to using tweets to identify hijacking incidents by using unsupervised anomaly detection algorithms. Tweets consisting of the keyword "hijacking" are obtained, stored, and processed using the term frequency-inverse document frequency (TF-IDF) and further analyzed by using two anomaly detection algorithms: 1) K-Nearest Neighbour (KNN); 2) Cluster Based Outlier Factor (CBLOF). The comparative evaluation showed that the KNN method produced an accuracy of 89%, whereas the CBLOF produced an accuracy of 90%. The CBLOF method was also able to obtain a F1-Score of 0.8, whereas the KNN produced a 0.78. Therefore, there is a slight difference between the two approaches, in favour of CBLOF, which has been selected as a preferred unsupervised method for the determination of relevant hijacking tweets. In future, a comparison will be done between supervised learning methods and the unsupervised methods presented in this work on larger dataset. Optimisation mechanisms will also be employed in order to increase the overall performance.

View on arXiv PDF

Similar