LGAug 22, 2023
Performance Comparison and Implementation of Bayesian Variants for Network Intrusion DetectionTosin Ige, Christopher Kiekintveld
Bayesian classifiers perform well when each of the features is completely independent of the other which is not always valid in real world application. The aim of this study is to implement and compare the performances of each variant of Bayesian classifier (Multinomial, Bernoulli, and Gaussian) on anomaly detection in network intrusion, and to investigate whether there is any association between each variant assumption and their performance. Our investigation showed that each variant of Bayesian algorithm blindly follows its assumption regardless of feature property, and that the assumption is the single most important factor that influences their accuracy. Experimental results show that Bernoulli has accuracy of 69.9% test (71% train), Multinomial has accuracy of 31.2% test (31.2% train), while Gaussian has accuracy of 81.69% test (82.84% train). Going deeper, we investigated and found that each Naive Bayes variants performances and accuracy is largely due to each classifier assumption, Gaussian classifier performed best on anomaly detection due to its assumption that features follow normal distributions which are continuous, while multinomial classifier have a dismal performance as it simply assumes discreet and multinomial distribution.
CRDec 19, 2025
MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and ClassificationTosin Ige, Christopher Kiekintveld, Aritran Piplai et al.
Out of distribution (OOD) detection remains a critical challenge in malware classification due to the substantial intra family variability introduced by polymorphic and metamorphic malware variants. Most existing deep learning based malware detectors rely on closed world assumptions and fail to adequately model this intra class variation, resulting in degraded performance when confronted with previously unseen malware families. This paper presents MADOOD, a novel two stage, cluster driven deep learning framework for robust OOD malware detection and classification. In the first stage, malware family embeddings are modeled using class conditional spherical decision boundaries derived from Gaussian Discriminant Analysis (GDA), enabling statistically grounded separation of indistribution and OOD samples without requiring OOD data during training. Z score based distance analysis across multiple class centroids is employed to reliably identify anomalous samples in the latent space. In the second stage, a deep neural network integrates cluster based predictions, refined embeddings, and supervised classifier outputs to enhance final classification accuracy. Extensive evaluations on benchmark malware datasets comprising 25 known families and multiple novel OOD variants demonstrate that MADOOD significantly outperforms state of the art OOD detection methods, achieving an AUC of up to 0.911 on unseen malware families. The proposed framework provides a scalable, interpretable, and statistically principled solution for real world malware detection and anomaly identification in evolving cybersecurity environments.
AINov 28, 2023
Generation of Games for Opponent Model DifferentiationDavid Milec, Viliam Lisý, Christopher Kiekintveld
Protecting against adversarial attacks is a common multiagent problem. Attackers in the real world are predominantly human actors, and the protection methods often incorporate opponent models to improve the performance when facing humans. Previous results show that modeling human behavior can significantly improve the performance of the algorithms. However, modeling humans correctly is a complex problem, and the models are often simplified and assume humans make mistakes according to some distribution or train parameters for the whole population from which they sample. In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts. However, in the previous work, the tests on a handmade game could not show strategic differences between the models. We created a novel model that links its parameters to psychological traits. We optimized over parametrized games and created games in which the differences are profound. Our work can help with automatic game generation when we need a game in which some models will behave differently and to identify situations in which the models do not align.
CRNov 24, 2024
An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A SurveyTosin Ige, Christopher Kiekintveld, Aritran Piplai et al.
Phishing is one of the most effective ways in which cybercriminals get sensitive details such as credentials for online banking, digital wallets, state secrets, and many more from potential victims. They do this by spamming users with malicious URLs with the sole purpose of tricking them into divulging sensitive information which is later used for various cybercrimes. In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques to expose their vulnerabilities and future research direction. For better analysis and observation, we split machine learning techniques into Bayesian, non-Bayesian, and deep learning. We reviewed the most recent advances in Bayesian and non-Bayesian-based classifiers before exploiting their corresponding weaknesses to indicate future research direction. While exploiting weaknesses in both Bayesian and non-Bayesian classifiers, we also compared each performance with a deep learning classifier. For a proper review of deep learning-based classifiers, we looked at Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and Long Short Term Memory Networks (LSTMs). We did an empirical analysis to evaluate the performance of each classifier along with many of the proposed state-of-the-art anti-phishing techniques to identify future research directions, we also made a series of proposals on how the performance of the under-performing algorithm can improved in addition to a two-stage prediction model
CRFeb 27, 2024
Deep Learning-Based Speech and Vision Synthesis to Improve Phishing Attack Detection through a Multi-layer Adaptive FrameworkTosin Ige, Christopher Kiekintveld, Aritran Piplai
The ever-evolving ways attacker continues to im prove their phishing techniques to bypass existing state-of-the-art phishing detection methods pose a mountain of challenges to researchers in both industry and academia research due to the inability of current approaches to detect complex phishing attack. Thus, current anti-phishing methods remain vulnerable to complex phishing because of the increasingly sophistication tactics adopted by attacker coupled with the rate at which new tactics are being developed to evade detection. In this research, we proposed an adaptable framework that combines Deep learning and Randon Forest to read images, synthesize speech from deep-fake videos, and natural language processing at various predictions layered to significantly increase the performance of machine learning models for phishing attack detection.
CRFeb 26, 2024
An Investigation into the Performances of the State-of-the-art Machine Learning Approaches for Various Cyber-attack Detection: A SurveyTosin Ige, Christopher Kiekintveld, Aritran Piplai
In this research, we analyzed the suitability of each of the current state-of-the-art machine learning models for various cyberattack detection from the past 5 years with a major emphasis on the most recent works for comparative study to identify the knowledge gap where work is still needed to be done with regard to detection of each category of cyberattack. We also reviewed the suitability, effeciency and limitations of recent research on state-of-the-art classifiers and novel frameworks in the detection of differnet cyberattacks. Our result shows the need for; further research and exploration on machine learning approach for the detection of drive-by download attacks, an investigation into the mix performance of Naive Bayes to identify possible research direction on improvement to existing state-of-the-art Naive Bayes classifier, we also identify that current machine learning approach to the detection of SQLi attack cannot detect an already compromised database with SQLi attack signifying another possible future research direction.
CRMay 11, 2021
Survey and Taxonomy of Adversarial Reconnaissance TechniquesShanto Roy, Nazia Sharmin, Jaime C. Acosta et al.
Adversaries are often able to penetrate networks and compromise systems by exploiting vulnerabilities in people and systems. The key to the success of these attacks is information that adversaries collect throughout the phases of the cyber kill chain. We summarize and analyze the methods, tactics, and tools that adversaries use to conduct reconnaissance activities throughout the attack process. First, we discuss what types of information adversaries seek, and how and when they can obtain this information. Then, we provide a taxonomy and detailed overview of adversarial reconnaissance techniques. The taxonomy introduces a categorization of reconnaissance techniques based on the source as third-party, human-, and system-based information gathering. This paper provides a comprehensive view of adversarial reconnaissance that can help in understanding and modeling this complex but vital aspect of cyber attacks as well as insights that can improve defensive strategies, such as cyber deception.
CRFeb 21, 2020
Optimizing Vulnerability-Driven Honey Traffic Using Game TheoryIffat Anjum, Mohammad Sujan Miah, Mu Zhu et al.
Enterprises are increasingly concerned about adversaries that slowly and deliberately exploit resources over the course of months or even years. A key step in this kill chain is network reconnaissance, which has historically been active (e.g., network scans) and therefore detectable. However, new networking technology increases the possibility of passive network reconnaissance, which will be largely undetectable by defenders. In this paper, we propose Snaz, a technique that uses deceptively crafted honey traffic to confound the knowledge gained through passive network reconnaissance. We present a two-player non-zero-sum Stackelberg game model that characterizes how a defender should deploy honey traffic in the presence of an adversary who is aware of Snaz. In doing so, we demonstrate the existence of optimal defender strategies that will either dissuade an adversary from acting on the existence of real vulnerabilities observed within network traffic, or reveal the adversary's presence when it attempts to unknowingly attack an intrusion detection node.
CVDec 12, 2019
Local Context Normalization: Revisiting Local NormalizationAnthony Ortiz, Caleb Robinson, Dan Morris et al.
Normalization layers have been shown to improve convergence in deep neural networks, and even add useful inductive biases. In many vision applications the local spatial context of the features is important, but most common normalization schemes including Group Normalization (GN), Instance Normalization (IN), and Layer Normalization (LN) normalize over the entire spatial dimension of a feature. This can wash out important signals and degrade performance. For example, in applications that use satellite imagery, input images can be arbitrarily large; consequently, it is nonsensical to normalize over the entire area. Positional Normalization (PN), on the other hand, only normalizes over a single spatial position at a time. A natural compromise is to normalize features by local context, while also taking into account group level information. In this paper, we propose Local Context Normalization (LCN): a normalization layer where every feature is normalized based on a window around it and the filters in its group. We propose an algorithmic solution to make LCN efficient for arbitrary window sizes, even if every point in the image has a unique window. LCN outperforms its Batch Normalization (BN), GN, IN, and LN counterparts for object detection, semantic segmentation, and instance segmentation applications in several benchmark datasets, while keeping performance independent of the batch size and facilitating transfer learning.