Kim‐Kwang Raymond Choo

h-index125

41papers

1,655citations

Novelty30%

AI Score37

Ranked #89,938 of 194,257 authors (top 46%)#2,211 in CR (top 33%)

41 Papers

7.1CRMar 21, 2023

Poisoning Attacks in Federated Edge Learning for Digital Twin 6G-enabled IoTs: An Anticipatory Study

Mohamed Amine Ferrag, Burak Kantarci, Lucas C. Cordeiro et al.

Federated edge learning can be essential in supporting privacy-preserving, artificial intelligence (AI)-enabled activities in digital twin 6G-enabled Internet of Things (IoT) environments. However, we need to also consider the potential of attacks targeting the underlying AI systems (e.g., adversaries seek to corrupt data on the IoT devices during local updates or corrupt the model updates); hence, in this article, we propose an anticipatory study for poisoning attacks in federated edge learning for digital twin 6G-enabled IoT environments. Specifically, we study the influence of adversaries on the training and development of federated learning models in digital twin 6G-enabled IoT environments. We demonstrate that attackers can carry out poisoning attacks in two different learning settings, namely: centralized learning and federated learning, and successful attacks can severely reduce the model's accuracy. We comprehensively evaluate the attacks on a new cyber security dataset designed for IoT applications with three deep neural networks under the non-independent and identically distributed (Non-IID) data and the independent and identically distributed (IID) data. The poisoning attacks, on an attack classification problem, can lead to a decrease in accuracy from 94.93% to 85.98% with IID data and from 94.18% to 30.04% with Non-IID.

3.9CLMar 22, 2023

Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets

Richard Tarbell, Kim-Kwang Raymond Choo, Glenn Dietrich et al.

Electronic medical records (EMRs) are stored in relational databases. It can be challenging to access the required information if the user is unfamiliar with the database schema or general database fundamentals. Hence, researchers have explored text-to-SQL generation methods that provide healthcare professionals direct access to EMR data without needing a database expert. However, currently available datasets have been essentially "solved" with state-of-the-art models achieving accuracy greater than or near 90%. In this paper, we show that there is still a long way to go before solving text-to-SQL generation in the medical domain. To show this, we create new splits of the existing medical text-to-SQL dataset MIMICSQL that better measure the generalizability of the resulting models. We evaluate state-of-the-art language models on our new split showing substantial drops in performance with accuracy dropping from up to 92% to 28%, thus showing substantial room for improvement. Moreover, we introduce a novel data augmentation approach to improve the generalizability of the language models. Overall, this paper is the first step towards developing more robust text-to-SQL models in the medical domain.\footnote{The dataset and code will be released upon acceptance.

11.3CRApr 10, 2022Code

BABD: A Bitcoin Address Behavior Dataset for Pattern Analysis

Yuexin Xiang, Yuchen Lei, Ding Bao et al.

Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data, which is the largest labeled Bitcoin address behavior dataset publicly available to our knowledge. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models for the multi-classification task on our proposed dataset are between 93.24% and 97.13%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation). Besides, we initially analyze the behavior patterns of different types of Bitcoin addresses according to the extracted features.

9.1CVDec 21, 2023Code

MFABA: A More Faithful and Accelerated Boundary-based Attribution Method for Deep Neural Networks

Zhiyu Zhu, Huaming Chen, Jiayu Zhang et al.

To better understand the output of deep neural networks (DNN), attribution based methods have been an important approach for model interpretability, which assign a score for each input dimension to indicate its importance towards the model outcome. Notably, the attribution methods use the axioms of sensitivity and implementation invariance to ensure the validity and reliability of attribution results. Yet, the existing attribution methods present challenges for effective interpretation and efficient computation. In this work, we introduce MFABA, an attribution algorithm that adheres to axioms, as a novel method for interpreting DNN. Additionally, we provide the theoretical proof and in-depth analysis for MFABA algorithm, and conduct a large scale experiment. The results demonstrate its superiority by achieving over 101.5142 times faster speed than the state-of-the-art attribution algorithms. The effectiveness of MFABA is thoroughly evaluated through the statistical analysis in comparison to other methods, and the full implementation package is open-source at: https://github.com/LMBTough/MFABA

11.3CVJan 11, 2024Code

GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.

Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: https://github.com/LMBTough/GE-advGAN

14.2LGDec 8, 2024Code

DapperFL: Domain Adaptive Federated Learning with Model Fusion Pruning for Edge Devices

Yongzhe Jia, Xuyun Zhang, Hongsheng Hu et al.

Federated learning (FL) has emerged as a prominent machine learning paradigm in edge computing environments, enabling edge devices to collaboratively optimize a global model without sharing their private data. However, existing FL frameworks suffer from efficacy deterioration due to the system heterogeneity inherent in edge computing, especially in the presence of domain shifts across local data. In this paper, we propose a heterogeneous FL framework DapperFL, to enhance model performance across multiple domains. In DapperFL, we introduce a dedicated Model Fusion Pruning (MFP) module to produce personalized compact local models for clients to address the system heterogeneity challenges. The MFP module prunes local models with fused knowledge obtained from both local and remaining domains, ensuring robustness to domain shifts. Additionally, we design a Domain Adaptive Regularization (DAR) module to further improve the overall performance of DapperFL. The DAR module employs regularization generated by the pruned model, aiming to learn robust representations across domains. Furthermore, we introduce a specific aggregation algorithm for aggregating heterogeneous local models with tailored architectures and weights. We implement DapperFL on a realworld FL platform with heterogeneous clients. Experimental results on benchmark datasets with multiple domains demonstrate that DapperFL outperforms several state-of-the-art FL frameworks by up to 2.28%, while significantly achieving model volume reductions ranging from 20% to 80%. Our code is available at: https://github.com/jyzgh/DapperFL.

1.2CVAug 14, 2020Code

Generating Image Adversarial Examples by Embedding Digital Watermarks

Yuexin Xiang, Tiantian Li, Wei Ren et al.

With the increasing attention to deep neural network (DNN) models, attacks are also upcoming for such models. For example, an attacker may carefully construct images in specific ways (also referred to as adversarial examples) aiming to mislead the DNN models to output incorrect classification results. Similarly, many efforts are proposed to detect and mitigate adversarial examples, usually for certain dedicated attacks. In this paper, we propose a novel digital watermark-based method to generate image adversarial examples to fool DNN models. Specifically, partial main features of the watermark image are embedded into the host image almost invisibly, aiming to tamper with and damage the recognition capabilities of the DNN models. We devise an efficient mechanism to select host images and watermark images and utilize the improved discrete wavelet transform (DWT) based Patchwork watermarking algorithm with a set of valid hyperparameters to embed digital watermarks from the watermark image dataset into original images for generating image adversarial examples. The experimental results illustrate that the attack success rate on common DNN models can reach an average of 95.47% on the CIFAR-10 dataset and the highest at 98.71%. Besides, our scheme is able to generate a large number of adversarial examples efficiently, concretely, an average of 1.17 seconds for completing the attacks on each image on the CIFAR-10 dataset. In addition, we design a baseline experiment using the watermark images generated by Gaussian noise as the watermark image dataset that also displays the effectiveness of our scheme. Similarly, we also propose the modified discrete cosine transform (DCT) based Patchwork watermarking algorithm. To ensure repeatability and reproducibility, the source code is available on GitHub.

7.1LGDec 18, 2025

Feature-Selective Representation Misdirection for Machine Unlearning

Taozhao Chen, Linghan Huang, Kim-Kwang Raymond Choo et al.

As large language models (LLMs) are increasingly adopted in safety-critical and regulated sectors, the retention of sensitive or prohibited knowledge introduces escalating risks, ranging from privacy leakage to regulatory non-compliance to to potential misuse, and so on. Recent studies suggest that machine unlearning can help ensure deployed models comply with evolving legal, safety, and governance requirements. However, current unlearning techniques assume clean separation between forget and retain datasets, which is challenging in operational settings characterized by highly entangled distributions. In such scenarios, perturbation-based methods often degrade general model utility or fail to ensure safety. To address this, we propose Selective Representation Misdirection for Unlearning (SRMU), a novel principled activation-editing framework that enforces feature-aware and directionally controlled perturbations. Unlike indiscriminate model weights perturbations, SRMU employs a structured misdirection vector with an activation importance map. The goal is to allow SRMU selectively suppresses harmful representations while preserving the utility on benign ones. Experiments are conducted on the widely used WMDP benchmark across low- and high-entanglement configurations. Empirical results reveal that SRMU delivers state-of-the-art unlearning performance with minimal utility losses, and remains effective under 20-30\% overlap where existing baselines collapse. SRMU provides a robust foundation for safety-driven model governance, privacy compliance, and controlled knowledge removal in the emerging LLM-based applications. We release the replication package at https://figshare.com/s/d5931192a8824de26aff.

8.8LGDec 27, 2023

FairCompass: Operationalising Fairness in Machine Learning

Jessica Liu, Huaming Chen, Jun Shen et al.

As artificial intelligence (AI) increasingly becomes an integral part of our societal and individual activities, there is a growing imperative to develop responsible AI solutions. Despite a diverse assortment of machine learning fairness solutions is proposed in the literature, there is reportedly a lack of practical implementation of these tools in real-world applications. Industry experts have participated in thorough discussions on the challenges associated with operationalising fairness in the development of machine learning-empowered solutions, in which a shift toward human-centred approaches is promptly advocated to mitigate the limitations of existing techniques. In this work, we propose a human-in-the-loop approach for fairness auditing, presenting a mixed visual analytical system (hereafter referred to as 'FairCompass'), which integrates both subgroup discovery technique and the decision tree-based schema for end users. Moreover, we innovatively integrate an Exploration, Guidance and Informed Analysis loop, to facilitate the use of the Knowledge Generation Model for Visual Analytics in FairCompass. We evaluate the effectiveness of FairCompass for fairness auditing in a real-world scenario, and the findings demonstrate the system's potential for real-world deployability. We anticipate this work will address the current gaps in research for fairness and facilitate the operationalisation of fairness in machine learning systems.

8.2CLJun 25, 2024

Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

Ryan Pavlich, Nima Ebadi, Richard Tarbell et al.

Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on generating SQL statements from text queries. The broader challenge, however, lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably temporal-related queries. Our dataset is sourced from a smart building's IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data can improve overall text-to-SQL performance, nearly matching substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data, thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.

6.4LGMay 3, 2024

Holistic Evaluation Metrics: Use Case Sensitive Evaluation Metrics for Federated Learning

Yanli Li, Jehad Ibrahim, Huaming Chen et al.

A large number of federated learning (FL) algorithms have been proposed for different applications and from varying perspectives. However, the evaluation of such approaches often relies on a single metric (e.g., accuracy). Such a practice fails to account for the unique demands and diverse requirements of different use cases. Thus, how to comprehensively evaluate an FL algorithm and determine the most suitable candidate for a designated use case remains an open question. To mitigate this research gap, we introduce the Holistic Evaluation Metrics (HEM) for FL in this work. Specifically, we collectively focus on three primary use cases, which are Internet of Things (IoT), smart devices, and institutions. The evaluation metric encompasses various aspects including accuracy, convergence, computational efficiency, fairness, and personalization. We then assign a respective importance vector for each use case, reflecting their distinct performance requirements and priorities. The HEM index is finally generated by integrating these metric components with their respective importance vectors. Through evaluating different FL algorithms in these three prevalent use cases, our experimental results demonstrate that HEM can effectively assess and identify the FL algorithms best suited to particular scenarios. We anticipate this work sheds light on the evaluation process for pragmatic FL algorithms in real-world applications.

4.3NIJun 27, 2021

A Systematic Review of Bio-Cyber Interface Technologies and Security Issues for Internet of Bio-Nano Things

Sidra Zafar, Mohsin Nazir, Taimur Bakhshi et al.

Advances in synthetic biology and nanotechnology have contributed to the design of tools that can be used to control, reuse, modify, and re-engineer cells' structure, as well as enabling engineers to effectively use biological cells as programmable substrates to realize Bio-Nano Things (biological embedded computing devices). Bio-NanoThings are generally tiny, non-intrusive, and concealable devices that can be used for in-vivo applications such as intra-body sensing and actuation networks, where the use of artificial devices can be detrimental. Such (nano-scale) devices can be used in various healthcare settings such as continuous health monitoring, targeted drug delivery, and nano-surgeries. These services can also be grouped to form a collaborative network (i.e., nanonetwork), whose performance can potentially be improved when connected to higher bandwidth external networks such as the Internet, say via 5G. However, to realize the IoBNT paradigm, it is also important to seamlessly connect the biological environment with the technological landscape by having a dynamic interface design to convert biochemical signals from the human body into an equivalent electromagnetic signal (and vice versa). This, unfortunately, risks the exposure of internal biological mechanisms to cyber-based sensing and medical actuation, with potential security and privacy implications. This paper comprehensively reviews bio-cyber interface for IoBNT architecture, focusing on bio-cyber interfacing options for IoBNT like biologically inspired bio-electronic devices, RFID enabled implantable chips, and electronic tattoos. This study also identifies known and potential security and privacy vulnerabilities and mitigation strategies for consideration in future IoBNT designs and implementations.

1.4CVMay 19, 2021Code

A Lightweight Privacy-Preserving Scheme Using Label-based Pixel Block Mixing for Image Classification in Deep Learning

Yuexin Xiang, Tiantian Li, Wei Ren et al.

To ensure the privacy of sensitive data used in the training of deep learning models, a number of privacy-preserving methods have been designed by the research community. However, existing schemes are generally designed to work with textual data, or are not efficient when a large number of images is used for training. Hence, in this paper we propose a lightweight and efficient approach to preserve image privacy while maintaining the availability of the training set. Specifically, we design the pixel block mixing algorithm for image classification privacy preservation in deep learning. To evaluate its utility, we use the mixed training set to train the ResNet50, VGG16, InceptionV3 and DenseNet121 models on the WIKI dataset and the CNBC face dataset. Experimental findings on the testing set show that our scheme preserves image privacy while maintaining the availability of the training set in the deep learning models. Additionally, the experimental results demonstrate that we achieve good performance for the VGG16 model on the WIKI dataset and both ResNet50 and DenseNet121 on the CNBC dataset. The pixel block algorithm achieves fairly high efficiency in the mixing of the images, and it is computationally challenging for the attackers to restore the mixed training set to the original training set. Moreover, data augmentation can be applied to the mixed training set to improve the training's effectiveness.

2.3CYMay 16, 2021

Investigating Protected Health Information Leakage from Android Medical Applications

George Grispos, Talon Flynn, William Glisson et al.

As smartphones and smartphone applications are widely used in a healthcare context (e.g., remote healthcare), these devices and applications may need to comply with the Health Insurance Portability and Accountability Act (HIPAA) of 1996. In other words, adequate safeguards to protect the user's sensitive information (e.g., personally identifiable information and/or medical history) are required to be enforced on such devices and applications. In this study, we forensically focus on the potential of recovering residual data from Android medical applications, with the objective of providing an initial risk assessment of such applications. Our findings (e.g., documentation of the artifacts) also contribute to a better understanding of the types and location of evidential artifacts that can, potentially, be recovered from these applications in a digital forensic investigation.

10.7CRMay 14, 2021

Consumer, Commercial and Industrial IoT (In)Security: Attack Taxonomy and Case Studies

Christos Xenofontos, Ioannis Zografopoulos, Charalambos Konstantinou et al.

Internet of Things (IoT) devices are becoming ubiquitous in our lives, with applications spanning from the consumer domain to commercial and industrial systems. The steep growth and vast adoption of IoT devices reinforce the importance of sound and robust cybersecurity practices during the device development life-cycles. IoT-related vulnerabilities, if successfully exploited can affect, not only the device itself, but also the application field in which the IoT device operates. Evidently, identifying and addressing every single vulnerability is an arduous, if not impossible, task. Attack taxonomies can assist in classifying attacks and their corresponding vulnerabilities. Security countermeasures and best practices can then be leveraged to mitigate threats and vulnerabilities before they emerge into catastrophic attacks and ensure overall secure IoT operation. Therefore, in this paper, we provide an attack taxonomy which takes into consideration the different layers of IoT stack, i.e., device, infrastructure, communication, and service, and each layer's designated characteristics which can be exploited by adversaries. Furthermore, using nine real-world cybersecurity incidents, that had targeted IoT devices deployed in the consumer, commercial, and industrial sectors, we describe the IoT-related vulnerabilities, exploitation procedures, attacks, impacts, and potential mitigation mechanisms and protection strategies. These (and many other) incidents highlight the underlying security concerns of IoT systems and demonstrate the potential attack impacts of such connected ecosystems, while the proposed taxonomy provides a systematic procedure to categorize attacks based on the affected layer and corresponding impact.

7.2CRDec 21, 2020

DeepKeyGen: A Deep Learning-based Stream Cipher Generator for Medical Image Encryption and Decryption

Yi Ding, Fuyuan Tan, Zhen Qin et al.

The need for medical image encryption is increasingly pronounced, for example to safeguard the privacy of the patients' medical imaging data. In this paper, a novel deep learning-based key generation network (DeepKeyGen) is proposed as a stream cipher generator to generate the private key, which can then be used for encrypting and decrypting of medical images. In DeepKeyGen, the generative adversarial network (GAN) is adopted as the learning network to generate the private key. Furthermore, the transformation domain (that represents the "style" of the private key to be generated) is designed to guide the learning network to realize the private key generation process. The goal of DeepKeyGen is to learn the mapping relationship of how to transfer the initial image to the private key. We evaluate DeepKeyGen using three datasets, namely: the Montgomery County chest X-ray dataset, the Ultrasonic Brachial Plexus dataset, and the BraTS18 dataset. The evaluation findings and security analysis show that the proposed key generation network can achieve a high-level security in generating the private key.

18.4CROct 19, 2020

A Survey of Machine Learning Techniques in Adversarial Image Forensics

Ehsan Nowroozi, Ali Dehghantanha, Reza M. Parizi et al.

Image forensic plays a crucial role in both criminal investigations (e.g., dissemination of fake images to spread racial hate or false narratives about specific ethnicity groups) and civil litigation (e.g., defamation). Increasingly, machine learning approaches are also utilized in image forensics. However, there are also a number of limitations and vulnerabilities associated with machine learning-based approaches, for example how to detect adversarial (image) examples, with real-world consequences (e.g., inadmissible evidence, or wrongful conviction). Therefore, with a focus on image forensics, this paper surveys techniques that can be used to enhance the robustness of machine learning-based binary manipulation detectors in various adversarial scenarios.

11.5CRSep 23, 2020

Pocket Diagnosis: Secure Federated Learning against Poisoning Attack in the Cloud

Zhuoran Ma, Jianfeng Ma, Yinbin Miao et al.

Federated learning has become prevalent in medical diagnosis due to its effectiveness in training a federated model among multiple health institutions (i.e. Data Islands (DIs)). However, increasingly massive DI-level poisoning attacks have shed light on a vulnerability in federated learning, which inject poisoned data into certain DIs to corrupt the availability of the federated model. Previous works on federated learning have been inadequate in ensuring the privacy of DIs and the availability of the final federated model. In this paper, we design a secure federated learning mechanism with multiple keys to prevent DI-level poisoning attacks for medical diagnosis, called SFPA. Concretely, SFPA provides privacy-preserving random forest-based federated learning by using the multi-key secure computation, which guarantees the confidentiality of DI-related information. Meanwhile, a secure defense strategy over encrypted locally-submitted models is proposed to defense DI-level poisoning attacks. Finally, our formal security analysis and empirical tests on a public cloud platform demonstrate the security and efficiency of SFPA as well as its capability of resisting DI-level poisoning attacks.

2.9CRMay 18, 2020

VerifyTL: Secure and Verifiable Collaborative Transfer Learning

Zhuoran Ma, Jianfeng Ma, Yinbin Miao et al.

Getting access to labelled datasets in certain sensitive application domains can be challenging. Hence, one often resorts to transfer learning to transfer knowledge learned from a source domain with sufficient labelled data to a target domain with limited labelled data. However, most existing transfer learning techniques only focus on one-way transfer which brings no benefit to the source domain. In addition, there is the risk of a covert adversary corrupting a number of domains, which can consequently result in inaccurate prediction or privacy leakage. In this paper we construct a secure and Verifiable collaborative Transfer Learning scheme, VerifyTL, to support two-way transfer learning over potentially untrusted datasets by improving knowledge transfer from a target domain to a source domain. Further, we equip VerifyTL with a cross transfer unit and a weave transfer unit employing SPDZ computation to provide privacy guarantee and verification in the two-domain setting and the multi-domain setting, respectively. Thus, VerifyTL is secure against covert adversary that can compromise up to n-1 out of n data domains. We analyze the security of VerifyTL and evaluate its performance over two real-world datasets. Experimental results show that VerifyTL achieves significant performance gains over existing secure learning schemes.

4.9CRJun 12, 2019

Integrating Privacy Enhancing Techniques into Blockchains Using Sidechains

Reza M. Parizi, Sajad Homayoun, Abbas Yazdinejad et al.

Blockchains are turning into decentralized computing platforms and are getting worldwide recognition for their unique advantages. There is an emerging trend beyond payments that blockchains could enable a new breed of decentralized applications, and serve as the foundation for Internet's security infrastructure. The immutable nature of the blockchain makes it a winner on security and transparency; it is nearly inconceivable for ledgers to be altered in a way not instantly clear to every single user involved. However, most blockchains fall short in privacy aspects, particularly in data protection. Garlic Routing and Onion Routing are two of major Privacy Enhancing Techniques (PETs) which are popular for anonymization and security. Garlic Routing is a methodology using by I2P Anonymous Network to hide the identity of sender and receiver of data packets by bundling multiple messages into a layered encryption structure. The Onion Routing attempts to provide lowlatency Internet-based connections that resist traffic analysis, deanonymization attack, eavesdropping, and other attacks both by outsiders (e.g. Internet routers) and insiders (Onion Routing servers themselves). As there are a few controversies over the rate of resistance of these two techniques to privacy attacks, we propose a PET-Enabled Sidechain (PETES) as a new privacy enhancing technique by integrating Garlic Routing and Onion Routing into a Garlic Onion Routing (GOR) framework suitable to the structure of blockchains. The preliminary proposed GOR aims to improve the privacy of transactions in blockchains via PETES structure.

8.3CRJun 12, 2019

A Blockchain-based Framework for Detecting Malicious Mobile Applications in App Stores

Sajad Homayoun, Ali Dehghantanha, Reza M. Parizi et al.

The dramatic growth in smartphone malware shows that malicious program developers are shifting from traditional PC systems to smartphone devices. Therefore, security researchers are also moving towards proposing novel antimalware methods to provide adequate protection. This paper proposes a Blockchain-Based Malware Detection Framework (B2MDF) for detecting malicious mobile applications in mobile applications marketplaces (app stores). The framework consists of two internal and external private blockchains forming a dual private blockchain as well as a consortium blockchain for the final decision. The internal private blockchain stores feature blocks extracted by both static and dynamic feature extractors, while the external blockchain stores detection results as blocks for current versions of applications. B2MDF also shares feature blocks with third parties, and this helps antimalware vendors to provide more accurate solutions.

19.5CRSep 7, 2018

Empirical Vulnerability Analysis of Automated Smart Contracts Security Testing on Blockchains

Reza M. Parizi, Ali Dehghantanha, Kim-Kwang Raymond Choo et al.

The emerging blockchain technology supports decentralized computing paradigm shift and is a rapidly approaching phenomenon. While blockchain is thought primarily as the basis of Bitcoin, its application has grown far beyond cryptocurrencies due to the introduction of smart contracts. Smart contracts are self-enforcing pieces of software, which reside and run over a hosting blockchain. Using blockchain-based smart contracts for secure and transparent management to govern interactions (authentication, connection, and transaction) in Internet-enabled environments, mostly IoT, is a niche area of research and practice. However, writing trustworthy and safe smart contracts can be tremendously challenging because of the complicated semantics of underlying domain-specific languages and its testability. There have been high-profile incidents that indicate blockchain smart contracts could contain various code-security vulnerabilities, instigating financial harms. When it involves security of smart contracts, developers embracing the ability to write the contracts should be capable of testing their code, for diagnosing security vulnerabilities, before deploying them to the immutable environments on blockchains. However, there are only a handful of security testing tools for smart contracts. This implies that the existing research on automatic smart contracts security testing is not adequate and remains in a very stage of infancy. With a specific goal to more readily realize the application of blockchain smart contracts in security and privacy, we should first understand their vulnerabilities before widespread implementation. Accordingly, the goal of this paper is to carry out a far-reaching experimental assessment of current static smart contracts security testing tools, for the most widely used blockchain, the Ethereum and its domain-specific programming language, Solidity to provide the first...

1.2CYAug 6, 2018

Digital Blues: An Investigation into the Use of Bluetooth Protocols

William Ledbetter, William Bradley Glisson, Todd McDonald et al.

The proliferation of Bluetooth mobile device communications into all aspects of modern society raises security questions by both academicians and practitioners. This environment prompted an investigation into the real-world use of Bluetooth protocols along with an analysis of documented security attacks. The experiment discussed in this paper collected data for one week in a local coffee shop. The data collection took about an hour each day and identified 478 distinct devices. The contribution of this research is two-fold. First, it provides insight into real-world Bluetooth protocols that are being utilized by the general public. Second, it provides foundational research that is necessary for future Bluetooth penetration testing research.

4.2CRAug 3, 2018

Non-Reciprocity Compensation Combined with Turbo Codes for Secret Key Generation in Vehicular Ad Hoc Social IoT Networks

Gregory Epiphaniou, Petros Karadimas, Dhouha Kbaier Ben Ismail et al.

The physical attributes of the dynamic vehicle-to-vehicle (V2V) propagation channel can be utilised for the generation of highly random and symmetric cryptographic keys. However, in a physical-layer key agreement scheme, non-reciprocity due to inherent channel noise and hardware impairments can propagate bit disagreements. This has to be addressed prior to the symmetric key generation which is inherently important in social Internet of Things (IoT) networks, including in adversarial settings (e.g. battlefields). In this paper, we parametrically incorporate temporal variability attributes, such as three-dimensional (3D) scattering and scatterers mobility. Accordingly, this is the first work to incorporate such features into the key generation process by combining non-reciprocity compensation with turbo codes. Preliminary results indicate a significant improvement when using Turbo Codes in bit mismatch rate (BMR) and key generation rate (KGR) in comparison to sample indexing techniques.

14.0CRJul 27, 2018

Ubuntu One Investigation: Detecting Evidences on Client Machines

Mohammad Shariati, Ali Dehghantanha1, Ben Martini et al.

STorage as a Service (STaaS) cloud services has been adopted by both individuals and businesses as a dominant technology worldwide. Similar to other technologies, this widely accepted service can be misused by criminals. Investigating cloud platforms is becoming a standard component of contemporary digital investigation cases. Hence, digital forensic investigators need to have a working knowledge of the potential evidence that might be stored on cloud services. In this chapter, we conducted a number of experiments to locate data remnants of users' activities when utilizing the Ubuntu One cloud service. We undertook experiments based on common activities performed by users on cloud platforms including downloading, uploading, viewing, and deleting files. We then examined the resulting digital artifacts on a range of client devices, namely, Windows 8.1, Apple Mac OS X, and Apple iOS. Our examination extracted a variety of potentially evidential items ranging from Ubuntu One databases and log files on persistent storage to remnants of user activities in device memory and network traffic.

8.5CRJul 27, 2018

Greening Cloud-Enabled Big Data Storage Forensics: Syncany as a Case Study

Yee-Yang Teing, Ali Dehghantanha, Kim-Kwang Raymond Choo

The pervasive nature of cloud-enabled big data storage solutions introduces new challenges in the identification, collection, analysis, preservation and archiving of digital evidences. Investigation of such complex platforms to locate and recover traces of criminal activities is a time-consuming process. Hence, cyber forensics researchers are moving towards streamlining the investigation process by locating and documenting residual artefacts (evidences) of forensic value of users activities on cloud-enabled big data platforms in order to reduce the investigation time and resources involved in a real-world investigation. In this paper, we seek to determine the data remnants of forensic value from Syncany private cloud storage service, a popular storage engine for big data platforms. We demonstrate the types and the locations of the artefacts that can be forensically recovered. Findings from this research contribute to an in-depth understanding of cloud-enabled big data storage forensics, which can result in reduced time and resources spent in real-world investigations involving Syncany-based cloud platforms.

9.6CRJul 27, 2018

Ensemble-based Multi-Filter Feature Selection Method for DDoS Detection in Cloud Computing

Opeyemi Osanaiye, Kim-Kwang Raymond Choo2, Ali Dehghantanha et al.

Increasing interest in the adoption of cloud computing has exposed it to cyber-attacks. One of such is distributed denial of service (DDoS) attack that targets cloud bandwidth, services and resources to make it unavailable to both the cloud providers and users. Due to the magnitude of traffic that needs to be processed, data mining and machine learning classification algorithms have been proposed to classify normal packets from an anomaly. Feature selection has also been identified as a pre-processing phase in cloud DDoS attack defence that can potentially increase classification accuracy and reduce computational complexity by identifying important features from the original dataset, during supervised learning. In this work, we propose an ensemble-based multi-filter feature selection method that combines the output of four filter methods to achieve an optimum selection. An extensive experimental evaluation of our proposed method was performed using intrusion detection benchmark dataset, NSL-KDD and decision tree classifier. The result obtained shows that our proposed method effectively reduced the number of features from 41 to 13 and has a high detection rate and classification accuracy when compared to other classification techniques.

4.2CRJul 26, 2018

CloudMe Forensics: A Case of Big-Data Investigation

Yee-Yang Teing, Ali Dehghantanha, Kim-Kwang Raymond Choo

The issue of increasing volume, variety and velocity of has been an area of concern in cloud forensics. The high volume of data will, at some point, become computationally exhaustive to be fully extracted and analysed in a timely manner. To cut down the size of investigation, it is important for a digital forensic practitioner to possess a well-rounded knowledge about the most relevant data artefacts from the cloud product investigating. In this paper, we seek to tackle on the residual artefacts from the use of CloudMe cloud storage service. We demonstrate the types and locations of the artefacts relating to the installation, uninstallation, log-in, log-off, and file synchronisation activities from the computer desktop and mobile clients. Findings from this research will pave the way towards the development of data mining methods for cloud-enabled big data endpoint forensics investigation.

5.8CRJun 6, 2018

IoTChain: A Three-Tier Blockchain-based IoT Security Architecture

Zijian Bao, Wenbo Shi, Debiao He et al.

There has been increasing interest in the potential of blockchain in enhancing the security of devices and systems, such as Internet of Things (IoT). In this paper, we present a blockchain-based IoT security architecture, IoTchain. The three-tier architecture comprises an authentication layer, a blockchain layer and an application layer, and is designed to achieve identity authentication, access control, privacy protection, lightweight feature, regional node fault tolerance, denial-of-service resilience, and storage integrity. We also evaluate the performance of IoTchain to demonstrate its utility in an IoT deployment.

4.2CRApr 23, 2018

Unmanned Aerial Vehicle Forensic Investigation Process: Dji Phantom 3 Drone As A Case Study

Alan Roder, Kim-Kwang Raymon Choo, Nhien-An Le-Khac

Drones (also known as Unmanned Aerial Vehicles, UAVs) is a potential source of evidence in a digital investigation, partly due to their increasing popularity in our society. However, existing UAV/drone forensics generally rely on conventional digital forensic investigation guidelines such as those of ACPO and NIST, which may not be entirely fit_for_purpose. In this paper, we identify the challenges associated with UAV/drone forensics. We then explore and evaluate existing forensic guidelines, in terms of their effectiveness for UAV/drone forensic investigations. Next, we present our set of guidelines for UAV/drone investigations. Finally, we demonstrate how the proposed guidelines can be used to guide a drone forensic investigation using the DJI Phantom 3 drone as a case study.

2.5CRSep 15, 2017

Performance of Android Forensics Data Recovery Tools

Bernard Chukwuemeka Ogazi-Onyemaechi, Ali Dehghantanha, Kim-Kwang Raymond Choo

Recovering deleted or hidden data is among most important duties of forensics investigators. Extensive utilisation of smartphones as subject, objects or tools of crime made them an important part of residual forensics. This chapter investigates the effectiveness of mobile forensic data recovery tools in recovering evidences from a Samsung Galaxy S2 i9100 Android phone. We seek to determine the amount of data that could be recovered using Phone image carver, Access data FTK, Foremost, Diskdigger, and Recover My File forensic tools. The findings reflected the difference between recovery capacities of studied tools showing their suitability in their specialised contexts only.

2.5CRAug 29, 2017

Investigation and Automating Extraction of Thumbnails Produced by Image viewers

Wybren van der Meer, Kim-Kwang Raymond Choo, Nhien-An Le-Khac et al.

Today, in digital forensics, images normally provide important information within an investigation. However, not all images may still be available within a forensic digital investigation as they were all deleted for example. Data carving can be used in this case to retrieve deleted images but the carving time is normally significant and these images can be moreover overwritten by other data. One of the solutions is to look at thumbnails of images that are no longer available. These thumbnails can often be found within databases created by either operating systems or image viewers. In literature, most research and practical focus on the extraction of thumbnails from databases created by the operating system. There is a little research working on the thumbnails created by the image reviewers as these thumbnails are application-driven in terms of pre-defined sizes, adjustments and storage location. Eventually, thumbnail databases from image viewers are significant forensic artefacts for investigators as these programs deal with large amounts of images. However, investigating these databases so far is still manual or semi-automatic task that leads to the huge amount of forensic time. Therefore, in this paper we propose a new approach of automating extraction of thumbnails produced by image viewers. We also test our approach with popular image viewers in different storage structures and locations to show its robustness.

9.1CRAug 17, 2017

Medical Cyber-Physical Systems Development: A Forensics-Driven Approach

George Grispos, William Bradley Glisson, Kim-Kwang Raymond Choo

The synthesis of technology and the medical industry has partly contributed to the increasing interest in Medical Cyber-Physical Systems (MCPS). While these systems provide benefits to patients and professionals, they also introduce new attack vectors for malicious actors (e.g. financially-and/or criminally-motivated actors). A successful breach involving a MCPS can impact patient data and system availability. The complexity and operating requirements of a MCPS complicates digital investigations. Coupling this information with the potentially vast amounts of information that a MCPS produces and/or has access to is generating discussions on, not only, how to compromise these systems but, more importantly, how to investigate these systems. The paper proposes the integration of forensics principles and concepts into the design and development of a MCPS to strengthen an organization's investigative posture. The framework sets the foundation for future research in the refinement of specific solutions for MCPS investigations.

6.3CRJul 15, 2017

Forensic Investigation of P2P Cloud Storage: BitTorrent Sync as a Case Study

Teing Yee Yang, Ali Dehghantanha, Kim-Kwang Raymond Choo et al.

Cloud computing has been regarded as the technology enabler for the Internet of Things (IoT). To ensure the most effective collection of IoT-based evidence, it is vital for forensic practitioners to possess a contemporary understanding of the artefacts from different cloud services. In this paper, we seek to determine the data remnants from the use of BitTorrent Sync version 2.0. Findings from our research using mobile and computer devices running Windows 8.1, Mac OS X Mavericks 10.9.5, Ubuntu 14.04.1 LTS, iOS 7.1.2, and Android KitKat 4.4.4 suggested that artefacts relating to the installation, uninstallation, log-in, log-off, and file synchronisation could be recovered, which are potential sources of IoT forensics. We also present a forensically sound investigation methodology for BitTorrent Sync.

4.5CRJun 25, 2017

Investigating America Online Instant Messaging Application: Data Remnants on Windows 8.1 Client Machine

Teing Yee Yang, Ali Dehghantanha, Kim-Kwang Raymond Choo et al.

Instant messaging applications (apps) are one potential source of evidence in a criminal investigation or a civil litigation. To ensure the most effective collection of evidence, it is vital for forensic practitioners to possess an up-to-date knowledge about artefacts of forensic interest from various instant messaging apps. Hence, in this chapter, we study America Online Instant Messenger (version 7.14.5.8) with the aims of contributing to an in-depth understanding of the types of terrestrial artefacts that are likely to remain after the use of instant messaging services and app on Windows 8.1 devices. Potential artefacts identified during the research include data relating to the installation or uninstallation, log-in and log-off information, contact lists, conversations, and transferred files.

6.3CRJun 25, 2017

Honeypots for employee information security awareness and education training: A conceptual EASY training model

Lek Christopher, Kim-Kwang Raymond Choo, Ali Dehghantanha

The increasing pervasiveness of internet-connected systems means that such systems will continue to be exploited for criminal purposes by cybercriminals (including malicious insiders such as employees and vendors). The importance of protecting corporate system and intellectual property, and the escalating complexities of the online environment underscore the need for ongoing information security awareness and education training and the promotion of a culture of security among employees. Two honeypots were deployed at a private university based in Singapore. Findings from the analysis of the honeypot data are presented in this paper. This paper then examines how analysis of honeypot data can be used in employee information security awareness and education training. Adapting the Routine Activity Theory, a criminology theory widely used in the study of cybercrime, this paper proposes a conceptual Engaging Stakeholders, Acceptable Behavior, Simple Teaching method, Yardstick (EASY) training model, and explains how the model can be used to design employee information security awareness and education training. Future research directions are also outlined in this paper.

6.3CRJun 25, 2017

Cloud Storage Forensics: Analysis of Data Remnants on SpiderOak, JustCloud, and pCloud

SeyedHossein Mohtasebi, Ali Dehghantanha, Kim-Kwang Raymond Choo

STorage as a Service (STaaS) cloud platforms benefits such as getting access to data anywhere, anytime, on a wide range of devices made them very popular among businesses and individuals. As such forensics investigators are increasingly facing cases that involve investigation of STaaS platforms. Therefore, it is essential for cyber investigators to know how to collect, preserve, and analyse evidences of these platforms. In this paper, we describe investigation of three STaaS platforms namely SpiderOak, JustCloud, and pCloud on Windows 8.1 and iOS 8.1.1 devices. Moreover, possible changes on uploaded and downloaded files metadata on these platforms would be tracked and their forensics value would be investigated.

14.1CRMar 17, 2016

Windows Instant Messaging App Forensics: Facebook and Skype as Case Studies

Teing Yee Yang, Ali Dehghantanha, Kim-Kwang Raymond Choo et al.

Instant messaging (IM) has changed the way people communicate with each other. However, the interactive and instant nature of these applications (apps) made them an attractive choice for malicious cyber activities such as phishing. The forensic examination of IM apps for modern Windows 8.1 (or later) has been largely unexplored, as the platform is relatively new. In this paper, we seek to determine the data remnants from the use of two popular Windows Store application software for instant messaging, namely Facebook and Skype on a Windows 8.1 client machine. This research contributes to an in-depth understanding of the types of terrestrial artefacts that are likely to remain after the use of instant messaging services and application software on a contemporary Windows operating system. Potential artefacts detected during the research include data relating to the installation or uninstallation of the instant messaging application software, log-in and log-off information, contact lists, conversations, and transferred files.

13.4CRSep 23, 2015

A Forensically Sound Adversary Model for Mobile Devices

Quang Do, Ben Martini, Kim-Kwang Raymond Choo

In this paper, we propose an adversary model to facilitate forensic investigations of mobile devices (e.g. Android, iOS and Windows smartphones) that can be readily adapted to the latest mobile device technologies. This is essential given the ongoing and rapidly changing nature of mobile device technologies. An integral principle and significant constraint upon forensic practitioners is that of forensic soundness. Our adversary model specifically considers and integrates the constraints of forensic soundness on the adversary, in our case, a forensic practitioner. One construction of the adversary model is an evidence collection and analysis methodology for Android devices. Using the methodology with six popular cloud apps, we were successful in extracting various information of forensic interest in both the external and internal storage of the mobile device.

5.7CRSep 23, 2015

Efficient and Anonymous Two-Factor User Authentication in Wireless Sensor Networks: Achieving User Anonymity with Lightweight Sensor Computation

Junghyun Nam, Kim-Kwang Raymond Choo, Sangchul Han et al.

A smart-card-based user authentication scheme for wireless sensor networks (hereafter referred to as a SCA-WSN scheme) is designed to ensure that only users who possess both a smart card and the corresponding password are allowed to gain access to sensor data and their transmissions. Despite many research efforts in recent years, it remains a challenging task to design an efficient SCA-WSN scheme that achieves user anonymity. The majority of published SCA-WSN schemes use only lightweight cryptographic techniques (rather than public-key cryptographic techniques) for the sake of efficiency, and have been demonstrated to suffer from the inability to provide user anonymity. Some schemes employ elliptic curve cryptography for better security but require sensors with strict resource constraints to perform computationally expensive scalar-point multiplications; despite the increased computational requirements, these schemes do not provide user anonymity. In this paper, we present a new SCA-WSN scheme that not only achieves user anonymity but also is efficient in terms of the computation loads for sensors. Our scheme employs elliptic curve cryptography but restricts its use only to anonymous user-to-gateway authentication, thereby allowing sensors to perform only lightweight cryptographic operations. Our scheme also enjoys provable security in a formal model extended from the widely accepted Bellare-Pointcheval-Rogaway (2000) model to capture the user anonymity property and various SCA-WSN specific attacks (e.g., stolen smart card attacks, node capture attacks, privileged insider attacks, and stolen verifier attacks).

8.6CYJun 18, 2015

Mobile Cloud Forensics: An Analysis of Seven Popular Android Apps

Ben Martini, Quang Do, Kim-Kwang Raymond Choo

Using the evidence collection and analysis methodology for Android devices proposed by Martini, Do and Choo, we examined and analyzed seven popular Android cloud-based apps. Firstly, we analyzed each app in order to see what information could be obtained from their private app storage and SD card directories. We collated the information and used it to aid our investigation of each app database files and AccountManager data. To complete our understanding of the forensic artefacts stored by apps we analyzed, we performed further analysis on the apps to determine if the user authentication credentials could be collected for each app based on the information gained in the initial analysis stages. The contributions of this research include a detailed description of artefacts, which are of general forensic interest, for each app analyzed.