CRNov 1, 2023
Stacking an autoencoder for feature selection of zero-day threatsMahmut Tokmak, Mike Nkongolo
Zero-day attack detection plays a critical role in mitigating risks, protecting assets, and staying ahead in the evolving threat landscape. This study explores the application of stacked autoencoder (SAE), a type of artificial neural network, for feature selection and zero-day threat classification using a Long Short-Term Memory (LSTM) scheme. The process involves preprocessing the UGRansome dataset and training an unsupervised SAE for feature extraction. Finetuning with supervised learning is then performed to enhance the discriminative capabilities of this model. The learned weights and activations of the autoencoder are analyzed to identify the most important features for discriminating between zero-day threats and normal system behavior. These selected features form a reduced feature set that enables accurate classification. The results indicate that the SAE-LSTM performs well across all three attack categories by showcasing high precision, recall, and F1 score values, emphasizing the model's strong predictive capabilities in identifying various types of zero-day attacks. Additionally, the balanced average scores of the SAE-LSTM suggest that the model generalizes effectively and consistently across different attack categories.
CRJun 16, 2023
Fuzzy Feature Selection with Key-based Cryptographic TransformationsMike Nkongolo
In the field of cryptography, the selection of relevant features plays a crucial role in enhancing the security and efficiency of cryptographic algorithms. This paper presents a novel approach of applying fuzzy feature selection to key-based cryptographic transformations. The proposed fuzzy feature selection leverages the power of fuzzy logic to identify and select optimal subsets of features that contribute most effectively to the cryptographic transformation process. By incorporating fuzzy feature selection into key-based cryptographic transformations, this research aims to improve the resistance against attacks and enhance the overall performance of cryptographic systems. Experimental evaluations may demonstrate the effectiveness of the proposed approach in selecting secure key features with minimal computational overhead. This paper highlights the potential of fuzzy feature selection as a valuable tool in the design and optimization of key-based cryptographic algorithms, contributing to the advancement of secure information exchange and communication in various domains.
CLDec 2, 2025
TriLex: A Framework for Multilingual Sentiment Analysis in Low-Resource South African LanguagesMike Nkongolo, Hilton Vorster, Josh Warren et al.
Low-resource African languages remain underrepresented in sentiment analysis, limiting both lexical coverage and the performance of multilingual Natural Language Processing (NLP) systems. This study proposes TriLex, a three-stage retrieval augmented framework that unifies corpus-based extraction, cross lingual mapping, and retrieval augmented generation (RAG) driven lexical refinement to systematically expand sentiment lexicons for low-resource languages. Using the enriched lexicon, the performance of two prominent African pretrained language models (AfroXLMR and AfriBERTa) is evaluated across multiple case studies. Results demonstrate that AfroXLMR delivers superior performance, achieving F1-scores above 80% for isiXhosa and isiZulu and exhibiting strong cross-lingual stability. Although AfriBERTa lacks pre-training on these target languages, it still achieves reliable F1-scores around 64%, validating its utility in computationally constrained settings. Both models outperform traditional machine learning baselines, and ensemble analyses further enhance precision and robustness. The findings establish TriLex as a scalable and effective framework for multilingual sentiment lexicon expansion and sentiment modeling in low-resource South African languages.
CRAug 29, 2023
Assessing Cyclostationary Malware Detection via Feature Selection and ClassificationMike Nkongolo
Cyclostationarity involves periodic statistical variations in signals and processes, commonly used in signal analysis and network security. In the context of attacks, cyclostationarity helps detect malicious behaviors within network traffic, such as traffic patterns in Distributed Denial of Service (DDoS) attacks or hidden communication channels in malware. This approach enhances security by identifying abnormal patterns and informing Network Intrusion Detection Systems (NIDSs) to recognize potential attacks, enhancing protection against both known and novel threats. This research focuses on identifying cyclostationary malware behavior and its detection. The main goal is to pinpoint essential cyclostationary features used in NIDSs. These features are extracted using algorithms such as Boruta and Principal Component Analysis (PCA), and then categorized to find the most significant cyclostationary patterns. The aim of this article is to reveal periodically changing malware behaviors through cyclostationarity. The study highlights the importance of spotting cyclostationary malware in NIDSs by using established datasets like KDD99, NSL-KDD, and the UGRansome dataset. The UGRansome dataset is designed for anomaly detection research and includes both normal and abnormal network threat categories of zero-day attacks. A comparison is made using the Random Forest (RF) and Support Vector Machine (SVM) algorithms, while also evaluating the effectiveness of Boruta and PCA. The findings show that PCA is more promising than using Boruta alone for extracting cyclostationary network feature patterns. Additionally, the analysis identifies the internet protocol as the most noticeable cyclostationary feature pattern used by malware. Notably, the UGRansome dataset outperforms the KDD99 and NSL-KDD, achieving 99% accuracy in signature malware detection using the RF algorithm and 98% with the SVM.
LGFeb 17, 2024
Ransomware detection using stacked autoencoder for feature selectionMike Nkongolo, Mahmut Tokmak
The aim of this study is to propose and evaluate an advanced ransomware detection and classification method that combines a Stacked Autoencoder (SAE) for precise feature selection with a Long Short Term Memory (LSTM) classifier to enhance ransomware stratification accuracy. The proposed approach involves thorough pre processing of the UGRansome dataset and training an unsupervised SAE for optimal feature selection or fine tuning via supervised learning to elevate the LSTM model's classification capabilities. The study meticulously analyzes the autoencoder's learned weights and activations to identify essential features for distinguishing ransomware families from other malware and creates a streamlined feature set for precise classification. Extensive experiments, including up to 400 epochs and varying learning rates, are conducted to optimize the model's performance. The results demonstrate the outstanding performance of the SAE-LSTM model across all ransomware families, boasting high precision, recall, and F1 score values that underscore its robust classification capabilities. Furthermore, balanced average scores affirm the proposed model's ability to generalize effectively across various malware types. The proposed model achieves an exceptional 99% accuracy in ransomware classification, surpassing the Extreme Gradient Boosting (XGBoost) algorithm primarily due to its effective SAE feature selection mechanism. The model also demonstrates outstanding performance in identifying signature attacks, achieving a 98% accuracy rate.