CRApr 26, 2023
SHIELD: Thwarting Code Authorship AttributionMohammed Abuhamad, Changhun Jung, David Mohaisen et al.
Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. In this paper, we introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples. We define four attacks on attribution techniques, which include targeted and non-targeted attacks, and realize them using adversarial code perturbation. We experiment with a dataset of 200 programmers from the Google Code Jam competition to validate our methods targeting six state-of-the-art authorship attribution methods that adopt a variety of techniques for extracting authorship traits from source-code, including RNN, CNN, and code stylometry. Our experiments demonstrate the vulnerability of current authorship attribution methods against adversarial attacks. For the non-targeted attack, our experiments demonstrate the vulnerability of current authorship attribution methods against the attack with an attack success rate exceeds 98.5\% accompanied by a degradation of the identification confidence that exceeds 13\%. For the targeted attacks, we show the possibility of impersonating a programmer using targeted-adversarial perturbations with a success rate ranging from 66\% to 88\% for different authorship attribution techniques under several adversarial scenarios.
CRNov 26, 2023
Untargeted Code Authorship Evasion with Seq2Seq TransformationSoohyeon Choi, Rhongho Jang, DaeHun Nyang et al.
Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.
CRApr 29, 2025
Enhancing Vulnerability Reports with Automated and Augmented Description SummarizationHattan Althebeiti, Mohammed Alkinoon, Manar Mohaisen et al.
Public vulnerability databases, such as the National Vulnerability Database (NVD), document vulnerabilities and facilitate threat information sharing. However, they often suffer from short descriptions and outdated or insufficient information. In this paper, we introduce Zad, a system designed to enrich NVD vulnerability descriptions by leveraging external resources. Zad consists of two pipelines: one collects and filters supplementary data using two encoders to build a detailed dataset, while the other fine-tunes a pre-trained model on this dataset to generate enriched descriptions. By addressing brevity and improving content quality, Zad produces more comprehensive and cohesive vulnerability descriptions. We evaluate Zad using standard summarization metrics and human assessments, demonstrating its effectiveness in enhancing vulnerability information.
CLJan 3, 2022
Robust Natural Language Processing: Recent Advances, Challenges, and Future DirectionsMarwan Omar, Soohyeon Choi, DaeHun Nyang et al.
Recent natural language processing (NLP) techniques have accomplished high performance on benchmark datasets, primarily due to the significant improvement in the performance of deep learning. The advances in the research community have led to great enhancements in state-of-the-art production systems for NLP tasks, such as virtual assistants, speech recognition, and sentiment analysis. However, such NLP systems still often fail when tested with adversarial attacks. The initial lack of robustness exposed troubling gaps in current models' language understanding capabilities, creating problems when NLP systems are deployed in real life. In this paper, we present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions. We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks. Finally, we argue that robustness should be multi-dimensional, provide insights into current research, identify gaps in the literature to suggest directions worth pursuing to address these gaps.
CRAug 30, 2021
ML-based IoT Malware Detection Under Adversarial Settings: A Systematic EvaluationAhmed Abusnaina, Afsah Anwar, Sultan Alshamrani et al.
The rapid growth of the Internet of Things (IoT) devices is paralleled by them being on the front-line of malicious attacks. This has led to an explosion in the number of IoT malware, with continued mutations, evolution, and sophistication. These malicious software are detected using machine learning (ML) algorithms alongside the traditional signature-based methods. Although ML-based detectors improve the detection performance, they are susceptible to malware evolution and sophistication, making them limited to the patterns that they have been trained upon. This continuous trend motivates the large body of literature on malware analysis and detection research, with many systems emerging constantly, and outperforming their predecessors. In this work, we systematically examine the state-of-the-art malware detection approaches, that utilize various representation and learning techniques, under a range of adversarial settings. Our analyses highlight the instability of the proposed detectors in learning patterns that distinguish the benign from the malicious software. The results exhibit that software mutations with functionality-preserving operations, such as stripping and padding, significantly deteriorate the accuracy of such detectors. Additionally, our analysis of the industry-standard malware detectors shows their instability to the malware mutations.
CRMar 26, 2021
ShellCore: Automating Malicious IoT Software Detection by Using Shell Commands RepresentationHisham Alasmary, Afsah Anwar, Ahmed Abusnaina et al.
The Linux shell is a command-line interpreter that provides users with a command interface to the operating system, allowing them to perform a variety of functions. Although very useful in building capabilities at the edge, the Linux shell can be exploited, giving adversaries a prime opportunity to use them for malicious activities. With access to IoT devices, malware authors can abuse the Linux shell of those devices to propagate infections and launch large-scale attacks, e.g., DDoS. In this work, we provide a first look at shell commands used in Linux-based IoT malware towards detection. We analyze malicious shell commands found in IoT malware and build a neural network-based model, ShellCore, to detect malicious shell commands. Namely, we collected a large dataset of shell commands, including malicious commands extracted from 2,891 IoT malware samples and benign commands collected from real-world network traffic analysis and volunteered data from Linux users. Using conventional machine and deep learning-based approaches trained with term- and character-level features, ShellCore is shown to achieve an accuracy of more than 99% in detecting malicious shell commands and files (i.e., binaries).
CRMar 26, 2021
Understanding Internet of Things Malware by Analyzing Endpoints in their Static ArtifactsAfsah Anwar, Jinchun Choi, Abdulrahman Alabduljabbar et al.
The lack of security measures among the Internet of Things (IoT) devices and their persistent online connection gives adversaries a prime opportunity to target them or even abuse them as intermediary targets in larger attacks such as distributed denial-of-service (DDoS) campaigns. In this paper, we analyze IoT malware and focus on the endpoints reachable on the public Internet, that play an essential part in the IoT malware ecosystem. Namely, we analyze endpoints acting as dropzones and their targets to gain insights into the underlying dynamics in this ecosystem, such as the affinity between the dropzones and their target IP addresses, and the different patterns among endpoints. Towards this goal, we reverse-engineer 2,423 IoT malware samples and extract strings from them to obtain IP addresses. We further gather information about these endpoints from public Internet-wide scanners, such as Shodan and Censys. For the masked IP addresses, we examine the Classless Inter-Domain Routing (CIDR) networks accumulating to more than 100 million (78.2% of total active public IPv4 addresses) endpoints. Our investigation from four different perspectives provides profound insights into the role of endpoints in IoT malware attacks, which deepens our understanding of IoT malware ecosystems and can assist future defenses.
CYMar 3, 2021
Hate, Obscenity, and Insults: Measuring the Exposure of Children to Inappropriate Comments in YouTubeSultan Alshamrani, Ahmed Abusnaina, Mohammed Abuhamad et al.
Social media has become an essential part of the daily routines of children and adolescents. Moreover, enormous efforts have been made to ensure the psychological and emotional well-being of young users as well as their safety when interacting with various social media platforms. In this paper, we investigate the exposure of those users to inappropriate comments posted on YouTube videos targeting this demographic. We collected a large-scale dataset of approximately four million records and studied the presence of five age-inappropriate categories and the amount of exposure to each category. Using natural language processing and machine learning techniques, we constructed ensemble classifiers that achieved high accuracy in detecting inappropriate comments. Our results show a large percentage of worrisome comments with inappropriate content: we found 11% of the comments on children's videos to be toxic, highlighting the importance of monitoring comments, particularly on children's platforms.
CRJan 1, 2021
e-PoS: Making Proof-of-Stake Decentralized and FairMuhammad Saad, Zhan Qin, Kui Ren et al.
Blockchain applications that rely on the Proof-of-Work (PoW) have increasingly become energy inefficient with a staggering carbon footprint. In contrast, energy-efficient alternative consensus protocols such as Proof-of-Stake (PoS) may cause centralization and unfairness in the blockchain system. To address these challenges, we propose a modular version of PoS-based blockchain systems called epos that resists the centralization of network resources by extending mining opportunities to a wider set of stakeholders. Moreover, epos leverages the in-built system operations to promote fair mining practices by penalizing malicious entities. We validate epos's achievable objectives through theoretical analysis and simulations. Our results show that epos ensures fairness and decentralization, and can be applied to existing blockchain applications.
CVJun 30, 2020
Generating Adversarial Examples with an Optimized QualityAminollah Khormali, DaeHun Nyang, David Mohaisen
Deep learning models are widely used in a range of application areas, such as computer vision, computer security, etc. However, deep learning models are vulnerable to Adversarial Examples (AEs),carefully crafted samples to deceive those models. Recent studies have introduced new adversarial attack methods, but, to the best of our knowledge, none provided guaranteed quality for the crafted examples as part of their creation, beyond simple quality measures such as Misclassification Rate (MR). In this paper, we incorporateImage Quality Assessment (IQA) metrics into the design and generation process of AEs. We propose an evolutionary-based single- and multi-objective optimization approaches that generate AEs with high misclassification rate and explicitly improve the quality, thus indistinguishability, of the samples, while perturbing only a limited number of pixels. In particular, several IQA metrics, including edge analysis, Fourier analysis, and feature descriptors, are leveraged into the process of generating AEs. Unique characteristics of the evolutionary-based algorithm enable us to simultaneously optimize the misclassification rate and the IQA metrics of the AEs. In order to evaluate the performance of the proposed method, we conduct intensive experiments on different well-known benchmark datasets(MNIST, CIFAR, GTSRB, and Open Image Dataset V5), while considering various objective optimization configurations. The results obtained from our experiments, when compared with the exist-ing attack methods, validate our initial hypothesis that the use ofIQA metrics within generation process of AEs can substantially improve their quality, while maintaining high misclassification rate.Finally, transferability and human perception studies are provided, demonstrating acceptable performance.
CRMay 14, 2020
A Deep Learning-based Fine-grained Hierarchical Learning Approach for Robust Malware ClassificationAhmed Abusnaina, Mohammed Abuhamad, Hisham Alasmary et al.
The wide acceptance of Internet of Things (IoT) for both household and industrial applications is accompanied by several security concerns. A major security concern is their probable abuse by adversaries towards their malicious intent. Understanding and analyzing IoT malicious behaviors is crucial, especially with their rapid growth and adoption in wide-range of applications. However, recent studies have shown that machine learning-based approaches are susceptible to adversarial attacks by adding junk codes to the binaries, for example, with an intention to fool those machine learning or deep learning-based detection systems. Realizing the importance of addressing this challenge, this study proposes a malware detection system that is robust to adversarial attacks. To do so, examine the performance of the state-of-the-art methods against adversarial IoT software crafted using the graph embedding and augmentation techniques. In particular, we study the robustness of such methods against two black-box adversarial methods, GEA and SGEA, to generate Adversarial Examples (AEs) with reduced overhead, and keeping their practicality intact. Our comprehensive experimentation with GEA-based AEs show the relation between misclassification and the graph size of the injected sample. Upon optimization and with small perturbation, by use of SGEA, all the IoT malware samples are misclassified as benign. This highlights the vulnerability of current detection systems under adversarial settings. With the landscape of possible adversarial attacks, we then propose DL-FHMC, a fine-grained hierarchical learning approach for malware detection and classification, that is robust to AEs with a capability to detect 88.52% of the malicious AEs.
CRMay 11, 2020
Contra-*: Mechanisms for Countering Spam Attacks on Blockchain's Memory PoolsMuhammad Saad, Joongheon Kim, DaeHun Nyang et al.
Blockchain-based cryptocurrencies, such as Bitcoin, have seen on the rise in their popularity and value, making them a target to several forms of Denial-of-Service (DoS) attacks, and calling for a better understanding of their attack surface from both security and distributed systems standpoints. In this paper, and in the pursuit of understanding the attack surface of blockchains, we explore a new form of attack that can be carried out on the memory pools (mempools) and mainly targets blockchain-based cryptocurrencies. We study this attack on Bitcoin mempool and explore the attack effects on transactions fee paid by benign users. To counter this attack, this paper further proposes Contra-*:, a set of countermeasures utilizing fee, age, and size (thus, Contra-F, Contra-A, and Contra-S) as prioritization mechanisms. Contra-*: optimize the mempool size and help in countering the effects of DoS attacks due to spam transactions. We evaluate Contra-* by simulations and analyze their effectiveness under various attack conditions.
CRJan 23, 2020
Sensor-based Continuous Authentication of Smartphones' Users Using Behavioral Biometrics: A Contemporary SurveyMohammed Abuhamad, Ahmed Abusnaina, DaeHun Nyang et al.
Mobile devices and technologies have become increasingly popular, offering comparable storage and computational capabilities to desktop computers allowing users to store and interact with sensitive and private information. The security and protection of such personal information are becoming more and more important since mobile devices are vulnerable to unauthorized access or theft. User authentication is a task of paramount importance that grants access to legitimate users at the point-of-entry and continuously through the usage session. This task is made possible with today's smartphones' embedded sensors that enable continuous and implicit user authentication by capturing behavioral biometrics and traits. In this paper, we survey more than 140 recent behavioral biometric-based approaches for continuous user authentication, including motion-based methods (28 studies), gait-based methods (19 studies), keystroke dynamics-based methods (20 studies), touch gesture-based methods (29 studies), voice-based methods (16 studies), and multimodal-based methods (34 studies). The survey provides an overview of the current state-of-the-art approaches for continuous user authentication using behavioral biometrics captured by smartphones' embedded sensors, including insights and open challenges for adoption, usability, and performance.
IVOct 2, 2019
W-Net: A CNN-based Architecture for White Blood Cells Image ClassificationChanghun Jung, Mohammed Abuhamad, Jumabek Alikhanov et al.
Computer-aided methods for analyzing white blood cells (WBC) have become widely popular due to the complexity of the manual process. Recent works have shown highly accurate segmentation and detection of white blood cells from microscopic blood images. However, the classification of the observed cells is still a challenge and highly demanded as the distribution of the five types reflects on the condition of the immune system. This work proposes W-Net, a CNN-based method for WBC classification. We evaluate W-Net on a real-world large-scale dataset, obtained from The Catholic University of Korea, that includes 6,562 real images of the five WBC types. W-Net achieves an average accuracy of 97%.
CRSep 20, 2019
COPYCAT: Practical Adversarial Attacks on Visualization-Based Malware DetectionAminollah Khormali, Ahmed Abusnaina, Songqing Chen et al.
Despite many attempts, the state-of-the-art of adversarial machine learning on malware detection systems generally yield unexecutable samples. In this work, we set out to examine the robustness of visualization-based malware detection system against adversarial examples (AEs) that not only are able to fool the model, but also maintain the executability of the original input. As such, we first investigate the application of existing off-the-shelf adversarial attack approaches on malware detection systems through which we found that those approaches do not necessarily maintain the functionality of the original inputs. Therefore, we proposed an approach to generate adversarial examples, COPYCAT, which is specifically designed for malware detection systems considering two main goals; achieving a high misclassification rate and maintaining the executability and functionality of the original input. We designed two main configurations for COPYCAT, namely AE padding and sample injection. While the first configuration results in untargeted misclassification attacks, the sample injection configuration is able to force the model to generate a targeted output, which is highly desirable in the malware attribution setting. We evaluate the performance of COPYCAT through an extensive set of experiments on two malware datasets, and report that we were able to generate adversarial samples that are misclassified at a rate of 98.9% and 96.5% with Windows and IoT binary datasets, respectively, outperforming the misclassification rates in the literature. Most importantly, we report that those AEs were executable unlike AEs generated by off-the-shelf approaches. Our transferability study demonstrates that the generated AEs through our proposed method can be generalized to other models.
CRApr 6, 2019
Exploring the Attack Surface of Blockchain: A Systematic OverviewMuhammad Saad, Jeffrey Spaulding, Laurent Njilla et al.
In this paper, we systematically explore the attack surface of the Blockchain technology, with an emphasis on public Blockchains. Towards this goal, we attribute attack viability in the attack surface to 1) the Blockchain cryptographic constructs, 2) the distributed architecture of the systems using Blockchain, and 3) the Blockchain application context. To each of those contributing factors, we outline several attacks, including selfish mining, the 51% attack, Domain Name System (DNS) attacks, distributed denial-of-service (DDoS) attacks, consensus delay (due to selfish behavior or distributed denial-of-service attacks), Blockchain forks, orphaned and stale blocks, block ingestion, wallet thefts, smart contract attacks, and privacy attacks. We also explore the causal relationships between these attacks to demonstrate how various attack vectors are connected to one another. A secondary contribution of this work is outlining effective defense measures taken by the Blockchain technology or proposed by researchers to mitigate the effects of these attacks and patch associated vulnerabilities
CRFeb 11, 2019
Analyzing, Comparing, and Detecting Emerging Malware: A Graph-based ApproachHisham Alasmary, Aminollah Khormali, Afsah Anwar et al.
The growth in the number of Android and Internet of Things (IoT) devices has witnessed a parallel increase in the number of malicious software (malware), calling for new analysis approaches. We represent binaries using their graph properties of the Control Flow Graph (CFG) structure and conduct an in-depth analysis of malicious graphs extracted from the Android and IoT malware to understand their differences. Using 2,874 and 2,891 malware binaries corresponding to IoT and Android samples, we analyze both general characteristics and graph algorithmic properties. Using the CFG as an abstract structure, we then emphasize various interesting findings, such as the prevalence of unreachable code in Android malware, noted by the multiple components in their CFGs, and larger number of nodes in the Android malware, compared to the IoT malware, highlighting a higher order of complexity. We implement a Machine Learning based classifiers to detect IoT malware from benign ones, and achieved an accuracy of 97.9% using Random Forests (RF).
CRJan 4, 2019
Network-based Analysis and Classification of Malware using Behavioral Artifacts OrderingAziz Mohaisen, Omar Alrawi, Jeman Park et al.
Using runtime execution artifacts to identify malware and its associated family is an established technique in the security domain. Many papers in the literature rely on explicit features derived from network, file system, or registry interaction. While effective, the use of these fine-granularity data points makes these techniques computationally expensive. Moreover, the signatures and heuristics are often circumvented by subsequent malware authors. In this work, we propose Chatter, a system that is concerned only with the order in which high-level system events take place. Individual events are mapped onto an alphabet and execution traces are captured via terse concatenations of those letters. Then, leveraging an analyst labeled corpus of malware, n-gram document classification techniques are applied to produce a classifier predicting malware family. This paper describes that technique and its proof-of-concept evaluation. In its prototype form, only network events are considered and eleven malware families are used. We show the technique achieves 83%-94% accuracy in isolation and makes non-trivial performance improvements when integrated with a baseline classifier of combined order features to reach an accuracy of up to 98.8%.
CRJun 29, 2018
Gruut: A Fully-Decentralized P2P Public LedgerDaeHun Nyang
Owing to Satoshi Nakamoto's brilliant idea, a P2P public ledger is shown to be implementable in anonymous network. Any Internet user can then join the anonymous network and contribute to the P2P public ledger by providing their computing power or proof-of-work. The proof-of-work is a clever implementation of one-CPU-one-vote by anonymous participants, and it protects the Bitcoin ledger from illegal modification. To compensate the nodes for their work, a cryptocurrency called Bitcoin is issued and given to nodes. However, the very nature of anonymity of the ledger and the cryptocurrency prevent the technology from being used in fiat money economy. Cryptocurrencies are not traceable even if they are used for money laundering or tax evasion, and the value of cryptocurrencies is not stable but fluctuates wildly. In this white paper, we introduce Gruut, a P2P ledger to implement a universal financial platform for fiat money. For this purpose, we introduce a new consensus algorithm called `proof-of-population,' which is one instance of `proof of public collaboration.' It can be used for multiple purposes; as a P2P ledger for banks, as a powerful tool for payment, including micropayment, and as a tool for any type of financial transactions. Even better, it distributes the profit obtained from transaction fee, currently dominated by a third party, to peers that cannot be centralized. Energy requirements of Gruut are so low that it is possible to run our software on a smartphone or on a personal computer without a graphic card.