41.3SDMar 27Code
AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake DetectionHai-Son Nguyen-Le, Hung-Cuong Nguyen-Thanh, Nhien-An Le-Khac et al.
The rapid advancement of generative models has enabled highly realistic audio deepfakes, yet current detectors suffer from a critical bias problem, leading to poor generalization across unseen datasets. This paper proposes Artifact-Focused Self-Synthesis (AFSS), a method designed to mitigate this bias by generating pseudo-fake samples from real audio via two mechanisms: self-conversion and self-reconstruction. The core insight of AFSS lies in enforcing same-speaker constraints, ensuring that real and pseudo-fake samples share identical speaker identity and semantic content. This forces the detector to focus exclusively on generation artifacts rather than irrelevant confounding factors. Furthermore, we introduce a learnable reweighting loss to dynamically emphasize synthetic samples during training. Extensive experiments across 7 datasets demonstrate that AFSS achieves state-of-the-art performance with an average EER of 5.45\%, including a significant reduction to 1.23\% on WaveFake and 2.70\% on In-the-Wild, all while eliminating the dependency on pre-collected fake datasets. Our code is publicly available at https://github.com/NguyenLeHaiSonGit/AFSS.
AIJul 15, 2022
Knowledge Representation in Digital Agriculture: A Step Towards Standardised ModelQuoc Hung Ngo, Tahar Kechadi, Nhien-An Le-Khac
In recent years, data science has evolved significantly. Data analysis and mining processes become routines in all sectors of the economy where datasets are available. Vast data repositories have been collected, curated, stored, and used for extracting knowledge. And this is becoming commonplace. Subsequently, we extract a large amount of knowledge, either directly from the data or through experts in the given domain. The challenge now is how to exploit all this large amount of knowledge that is previously known for efficient decision-making processes. Until recently, much of the knowledge gained through a number of years of research is stored in static knowledge bases or ontologies, while more diverse and dynamic knowledge acquired from data mining studies is not centrally and consistently managed. In this research, we propose a novel model called ontology-based knowledge map to represent and store the results (knowledge) of data mining in crop farming to build, maintain, and enrich the process of knowledge discovery. The proposed model consists of six main sets: concepts, attributes, relations, transformations, instances, and states. This model is dynamic and facilitates the access, updates, and exploitation of the knowledge at any time. This paper also proposes an architecture for handling this knowledge-based model. The system architecture includes knowledge modelling, extraction, assessment, publishing, and exploitation. This system has been implemented and used in agriculture for crop management and monitoring. It is proven to be very effective and promising for its extension to other domains.
AISep 29, 2022
OAK4XAI: Model towards Out-Of-Box eXplainable Artificial Intelligence for Digital AgricultureQuoc Hung Ngo, Tahar Kechadi, Nhien-An Le-Khac
Recent machine learning approaches have been effective in Artificial Intelligence (AI) applications. They produce robust results with a high level of accuracy. However, most of these techniques do not provide human-understandable explanations for supporting their results and decisions. They usually act as black boxes, and it is not easy to understand how decisions have been made. Explainable Artificial Intelligence (XAI), which has received much interest recently, tries to provide human-understandable explanations for decision-making and trained AI models. For instance, in digital agriculture, related domains often present peculiar or input features with no link to background knowledge. The application of the data mining process on agricultural data leads to results (knowledge), which are difficult to explain. In this paper, we propose a knowledge map model and an ontology design as an XAI framework (OAK4XAI) to deal with this issue. The framework does not only consider the data analysis part of the process, but it takes into account the semantics aspect of the domain knowledge via an ontology and a knowledge map model, provided as modules of the framework. Many ongoing XAI studies aim to provide accurate and verbalizable accounts for how given feature values contribute to model decisions. The proposed approach, however, focuses on providing consistent information and definitions of concepts, algorithms, and values involved in the data mining models. We built an Agriculture Computing Ontology (AgriComO) to explain the knowledge mined in agriculture. AgriComO has a well-designed structure and includes a wide range of concepts and transformations suitable for agriculture and computing domains.
LGOct 20, 2023
Enhancing Illicit Activity Detection using XAI: A Multimodal Graph-LLM FrameworkJack Nicholls, Aditya Kuppa, Nhien-An Le-Khac
Financial cybercrime prevention is an increasing issue with many organisations and governments. As deep learning models have progressed to identify illicit activity on various financial and social networks, the explainability behind the model decisions has been lacklustre with the investigative analyst at the heart of any deep learning platform. In our paper, we present a state-of-the-art, novel multimodal proactive approach to addressing XAI in financial cybercrime detection. We leverage a triad of deep learning models designed to distill essential representations from transaction sequencing, subgraph connectivity, and narrative generation to significantly streamline the analyst's investigative process. Our narrative generation proposal leverages LLM to ingest transaction details and output contextual narrative for an analyst to understand a transaction and its metadata much further.
CRSep 11, 2024
D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial AttackHong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen et al.
The advancements in generative AI have enabled the improvement of audio synthesis models, including text-to-speech and voice conversion. This raises concerns about its potential misuse in social manipulation and political interference, as synthetic speech has become indistinguishable from natural human speech. Several speech-generation programs are utilized for malicious purposes, especially impersonating individuals through phone calls. Therefore, detecting fake audio is crucial to maintain social security and safeguard the integrity of information. Recent research has proposed a D-CAPTCHA system based on the challenge-response protocol to differentiate fake phone calls from real ones. In this work, we study the resilience of this system and introduce a more robust version, D-CAPTCHA++, to defend against fake calls. Specifically, we first expose the vulnerability of the D-CAPTCHA system under transferable imperceptible adversarial attack. Secondly, we mitigate such vulnerability by improving the robustness of the system by using adversarial training in D-CAPTCHA deepfake detectors and task classifiers.
LGOct 4, 2023
Crossed-IoT device portability of Electromagnetic Side Channel Analysis: Challenges and DatasetTharindu Lakshan Yasarathna, Lojenaa Navanesan, Simon Barque et al.
IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on these connected devices. In particular, EM-SCA has become an essential tool for IoT forensics due to its ability to reveal confidential information about the internal workings of IoT devices without interfering these devices or wiretapping their networks. However, the accuracy and reliability of EM-SCA results can be limited by device variability, environmental factors, and data collection and processing methods. Besides, there is very few research on these limitations that affects significantly the accuracy of EM-SCA approaches for the crossed-IoT device portability as well as limited research on the possible solutions to address such challenge. Therefore, this empirical study examines the impact of device variability on the accuracy and reliability of EM-SCA approaches, in particular machine-learning (ML) based approaches for EM-SCA. We firstly presents the background, basic concepts and techniques used to evaluate the limitations of current EM-SCA approaches and datasets. Our study then addresses one of the most important limitation, which is caused by the multi-core architecture of the processors (SoC). We present an approach to collect the EM-SCA datasets and demonstrate the feasibility of using transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices. Our study moreover contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter.
36.6CVMar 19
Transferable Multi-Bit Watermarking Across Frozen Diffusion Models via Latent Consistency BridgesHong-Hanh Nguyen-Le, Van-Tuan Tran, Thuc D. Nguyen et al.
As diffusion models (DMs) enable photorealistic image generation at unprecedented scale, watermarking techniques have become essential for provenance establishment and accountability. Existing methods face challenges: sampling-based approaches operate on frozen models but require costly $N$-step Denoising Diffusion Implicit Models (DDIM) inversion (typically N=50) for zero-bit-only detection; fine-tuning-based methods achieve fast multi-bit extraction but couple the watermark to a specific model checkpoint, requiring retraining for each architecture. We propose DiffMark, a plug-and-play watermarking method that offers three key advantages over existing approaches: single-pass multi-bit detection, per-image key flexibility, and cross-model transferability. Rather than encoding the watermark into the initial noise vector, DiffMark injects a persistent learned perturbation $δ$ at every denoising step of a completely frozen DM. The watermark signal accumulates in the final denoised latent $z_0$ and is recovered in a single forward pass. The central challenge of backpropagating gradients through a frozen UNet without traversing the full denoising chain is addressed by employing Latent Consistency Models (LCM) as a differentiable training bridge. This reduces the number of gradient steps from 50 DDIM to 4 LCM and enables a single-pass detection at 16.4 ms, a 45x speedup over sampling-based methods. Moreover, by this design, the encoder learns to map any runtime secret to a unique perturbation at inference time, providing genuine per-image key flexibility and transferability to unseen diffusion-based architectures without per-model fine-tuning. Although achieving these advantages, DiffMark also maintains competitive watermark robustness against distortion, regeneration, and adversarial attacks.
LGFeb 9
Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing ViewsDuc-Anh Nguyen, Nhien-An Le-Khac
Multimodal multiview learning seeks to integrate information from diverse sources to enhance task performance. Existing approaches often struggle with flexible view configurations, including arbitrary view combinations, numbers of views, and heterogeneous modalities. Focusing on the context of human activity recognition, we propose RALIS, a model that combines multiview contrastive learning with a mixture-of-experts module to support arbitrary view availability during both training and inference. Instead of trying to reconstruct missing views, an adjusted center contrastive loss is used for self-supervised representation learning and view alignment, mitigating the impact of missing views on multiview fusion. This loss formulation allows for the integration of view weights to account for view quality. Additionally, it reduces computational complexity from $O(V^2)$ to $O(V)$, where $V$ is the number of views. To address residual discrepancies not captured by contrastive learning, we employ a mixture-of-experts module with a specialized load balancing strategy, tasked with adapting to arbitrary view combinations. We highlight the geometric relationship among components in our model and how they combine well in the latent space. RALIS is validated on four datasets encompassing inertial and human pose modalities, with the number of views ranging from three to nine, demonstrating its performance and flexibility.
CVMay 24, 2025Code
Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time AdaptationHong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen et al.
Deepfake (DF) detectors face significant challenges when deployed in real-world environments, particularly when encountering test samples deviated from training data through either postprocessing manipulations or distribution shifts. We demonstrate postprocessing techniques can completely obscure generation artifacts presented in DF samples, leading to performance degradation of DF detectors. To address these challenges, we propose Think Twice before Adaptation (\texttt{T$^2$A}), a novel online test-time adaptation method that enhances the adaptability of detectors during inference without requiring access to source training data or labels. Our key idea is to enable the model to explore alternative options through an Uncertainty-aware Negative Learning objective rather than solely relying on its initial predictions as commonly seen in entropy minimization (EM)-based approaches. We also introduce an Uncertain Sample Prioritization strategy and Gradients Masking technique to improve the adaptation by focusing on important samples and model parameters. Our theoretical analysis demonstrates that the proposed negative learning objective exhibits complementary behavior to EM, facilitating better adaptation capability. Empirically, our method achieves state-of-the-art results compared to existing test-time adaptation (TTA) approaches and significantly enhances the resilience and generalization of DF detectors during inference. Code is available \href{https://github.com/HongHanh2104/T2A-Think-Twice-Before-Adaptation}{here}.
MMDec 2, 2020Code
Retracing the Flow of the Stream: Investigating Kodi Streaming ServicesSamuel Todd Bromley, John Sheppard, Mark Scanlon et al.
Kodi is of one of the world's largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual abuse organizations publishing/streaming child abuse material to an international paying clientele. Open source software used for viewing videos from the Internet, such as Kodi, is being exploited by criminals to conduct their activities. In this paper, we describe a new method to quickly locate Kodi artifacts and gather information for a successful prosecution. We also evaluate our approach on different platforms; Windows, Android and Linux. Our experiments show the file location, artifacts and a history of viewed content including their locations from the Internet. Our approach will serve as a resource to forensic investigators to examine Kodi or similar streaming platforms.
CRNov 7, 2020Code
Identifying interception possibilities for WhatsApp communicationDennis Wijnberg, Nhien-An Le-Khac
On a daily basis, law enforcement officers struggle with suspects using mobile communication applications for criminal activities. These mobile applications replaced SMS-messaging and evolved the last few years from plain-text data transmission and storage to an encrypted version. Regardless of the benefits for all law abiding citizens, this is considered to be the downside for criminal investigations. Normal smartphone, computer or network investigations do no longer provide the contents of the communication in real-time when suspects are using apps like WhatsApp, Signal or Telegram. Among them, WhatsApp is one of the most common smartphone applications for communication, both criminal as well as legal activities. Early 2016 WhatsApp introduced end-to-end encryption for all users, immediately keeping law enforcement officers around the world in the dark. Existing research to recuperate the position of law enforcement is limited to a single field of investigation and often limited to post mortem research on smartphone or computer while wiretapping is limited to metadata information. Therefore, it provides only historical data or metadata while law enforcement officers want a continuous stream of live and substantive information. This paper identified that gap in available scenarios for law enforcement investigations and identified a gap in methods available for forensic acquiring and processing these scenarios. In this paper, we propose a forensic approach to create real-time insight in the WhatsApp communication. Our approach is based on the wiretapping, decrypting WhatsApp databases, open source intelligence and WhatsApp Web communication analysis. We also evaluate our method with different scenarios in WhatsApp forensics to prove its feasibility and efficiency.
27.6CRMay 4
ChaRVoC: A Challenge-Response Voice Cancelable Authentication SystemPhuc-Khang Vo-Hoang, Hoang C. Ta, Nhien-An Le-Khac et al.
In this work, we present a Challenge-Response Voice Cancelable authentication system, called ChaRVoC, which provides protection against replay attacks, revocability issues, and template compromise. Our approach integrates three security factors: (1) inherent voice biometric characteristics, (2) user-memorized secret keys enabling template revocability, and (3) dynamic system-generated challenges providing liveness detection. Specifically, we introduce a novel HashGray-XOR scheme which combines a cryptographic hash function with an unrecoverable graycode-based transformation to create secured templates that are mathematically proven to be non-invertible. We compare our methods with existing cancelable biometric methods (WTA, IoM, RoE) on VoxCeleb1, TIMIT, and VOiCES datasets to show the recognition performance of our proposed system. We also show that our system achieves both cancelability and unlinkability properties.
CVNov 26, 2024
Passive Deepfake Detection Across Multi-modalities: A Comprehensive SurveyHong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen et al.
In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation, raising questions about ethical and security concerns. In this survey, we provide a comprehensive review and comparison of passive DF detection across multiple modalities, including image, video, audio, and multi-modal, to explore the inter-modality relationships between them. Beyond detection accuracy, we extend our analysis to encompass crucial performance dimensions essential for real-world deployment: generalization capabilities across novel generation techniques, robustness against adversarial manipulations and postprocessing techniques, attribution precision in identifying generation sources, and resilience under real-world operational conditions. Additionally, we analyze the advantages and limitations of existing datasets, benchmarks, and evaluation metrics for passive DF detection. Finally, we propose future research directions that address these unexplored and emerging issues in the field of passive DF detection. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.
SPApr 25, 2024
SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep LearningDuc-Anh Nguyen, Nhien-An Le-Khac
Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.
LGNov 23, 2025
Beyond Binary Classification: A Semi-supervised Approach to Generalized AI-generated Image DetectionHong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen et al.
The rapid advancement of generators (e.g., StyleGAN, Midjourney, DALL-E) has produced highly realistic synthetic images, posing significant challenges to digital media authenticity. These generators are typically based on a few core architectural families, primarily Generative Adversarial Networks (GANs) and Diffusion Models (DMs). A critical vulnerability in current forensics is the failure of detectors to achieve cross-generator generalization, especially when crossing architectural boundaries (e.g., from GANs to DMs). We hypothesize that this gap stems from fundamental differences in the artifacts produced by these \textbf{distinct architectures}. In this work, we provide a theoretical analysis explaining how the distinct optimization objectives of the GAN and DM architectures lead to different manifold coverage behaviors. We demonstrate that GANs permit partial coverage, often leading to boundary artifacts, while DMs enforce complete coverage, resulting in over-smoothing patterns. Motivated by this analysis, we propose the \textbf{Tri}archy \textbf{Detect}or (TriDetect), a semi-supervised approach that enhances binary classification by discovering latent architectural patterns within the "fake" class. TriDetect employs balanced cluster assignment via the Sinkhorn-Knopp algorithm and a cross-view consistency mechanism, encouraging the model to learn fundamental architectural distincts. We evaluate our approach on two standard benchmarks and three in-the-wild datasets against 13 baselines to demonstrate its generalization capability to unseen generators.
CRSep 30, 2025
SoK: Systematic analysis of adversarial threats against deep learning approaches for autonomous anomaly detection systems in SDN-IoT networksTharindu Lakshan Yasarathna, Nhien-An Le-Khac
Integrating SDN and the IoT enhances network control and flexibility. DL-based AAD systems improve security by enabling real-time threat detection in SDN-IoT networks. However, these systems remain vulnerable to adversarial attacks that manipulate input data or exploit model weaknesses, significantly degrading detection accuracy. Existing research lacks a systematic analysis of adversarial vulnerabilities specific to DL-based AAD systems in SDN-IoT environments. This SoK study introduces a structured adversarial threat model and a comprehensive taxonomy of attacks, categorising them into data, model, and hybrid-level threats. Unlike previous studies, we systematically evaluate white, black, and grey-box attack strategies across popular benchmark datasets. Our findings reveal that adversarial attacks can reduce detection accuracy by up to 48.4%, with Membership Inference causing the most significant drop. C&W and DeepFool achieve high evasion success rates. However, adversarial training enhances robustness, and its high computational overhead limits the real-time deployment of SDN-IoT applications. We propose adaptive countermeasures, including real-time adversarial mitigation, enhanced retraining mechanisms, and explainable AI-driven security frameworks. By integrating structured threat models, this study offers a more comprehensive approach to attack categorisation, impact assessment, and defence evaluation than previous research. Our work highlights critical vulnerabilities in existing DL-based AAD models and provides practical recommendations for improving resilience, interpretability, and computational efficiency. This study serves as a foundational reference for researchers and practitioners seeking to enhance DL-based AAD security in SDN-IoT networks, offering a systematic adversarial threat model and conceptual defence evaluation based on prior empirical studies.
LGSep 14, 2025
Stabilizing Data-Free Model ExtractionDat-Thinh Nguyen, Kim-Hung Le, Nhien-An Le-Khac
Model extraction is a severe threat to Machine Learning-as-a-Service systems, especially through data-free approaches, where dishonest users can replicate the functionality of a black-box target model without access to realistic data. Despite recent advancements, existing data-free model extraction methods suffer from the oscillating accuracy of the substitute model. This oscillation, which could be attributed to the constant shift in the generated data distribution during the attack, makes the attack impractical since the optimal substitute model cannot be determined without access to the target model's in-distribution data. Hence, we propose MetaDFME, a novel data-free model extraction method that employs meta-learning in the generator training to reduce the distribution shift, aiming to mitigate the substitute model's accuracy oscillation. In detail, we train our generator to iteratively capture the meta-representations of the synthetic data during the attack. These meta-representations can be adapted with a few steps to produce data that facilitates the substitute model to learn from the target model while reducing the effect of distribution shifts. Our experiments on popular baseline image datasets, MNIST, SVHN, CIFAR-10, and CIFAR-100, demonstrate that MetaDFME outperforms the current state-of-the-art data-free model extraction method while exhibiting a more stable substitute model's accuracy during the attack.
IRMar 21, 2021
Structural Textile Pattern Recognition and Processing Based on HypergraphsVuong M. Ngo, Sven Helmer, Nhien-An Le-Khac et al.
The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairly well understood, and arrange their content following a certain taxonomy, but search functionality at the level of thread structure is still missing. To facilitate the clustering and search, we introduce an approach for recognising similar weaving patterns based on their structures for textile archives. We first represent textile structures using hypergraphs and extract multisets of k-neighbourhoods describing weaving patterns from these graphs. Then, the resulting multisets are clustered using various distance measures and various clustering algorithms (K-Means for simplicity and hierarchical agglomerative algorithms for precision). We evaluate the different variants of our approach experimentally, showing that this can be implemented efficiently (meaning it has linear complexity), and demonstrate its quality to query and cluster datasets containing large textile samples. As, to the est of our knowledge, this is the first practical approach for explicitly modelling complex and irregular weaving patterns usable for retrieval, we aim at establishing a solid baseline.
CVDec 2, 2020
Assessing the Influencing Factors on the Accuracy of Underage Facial Age EstimationFelix Anda, Brett A. Becker, David Lillis et al.
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic material. Automated techniques also show promise in decreasing the overflowing backlog of evidence obtained from increasing numbers of devices and online services. A lack of sufficient training data combined with natural human variance has been long hindering accurate automated age estimation -- especially for underage subjects. This paper presented a comprehensive evaluation of the performance of two cloud age estimation services (Amazon Web Service's Rekognition service and Microsoft Azure's Face API) against a dataset of over 21,800 underage subjects. The objective of this work is to evaluate the influence that certain human biometric factors, facial expressions, and image quality (i.e. blur, noise, exposure and resolution) have on the outcome of automated age estimation services. A thorough evaluation allows us to identify the most influential factors to be overcome in future age estimation systems.
CRDec 2, 2020
SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic InvestigationXiaoyu Du, Chris Hargreaves, John Sheppard et al.
Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based tools and approaches in digital forensics. Automated evidence processing leveraging artificial intelligence based techniques shows great promise in expediting the digital forensic analysis process while increasing case processing capacities. For each application of artificial intelligence highlighted, a number of current challenges and future potential impact is discussed.
DBNov 20, 2020
OAK: Ontology-Based Knowledge Map Model for Digital AgricultureQuoc Hung Ngo, Tahar Kechadi, Nhien-An Le-Khac
Nowadays, a huge amount of knowledge has been amassed in digital agriculture. This knowledge and know-how information are collected from various sources, hence the question is how to organise this knowledge so that it can be efficiently exploited. Although this knowledge about agriculture practices can be represented using ontology, rule-based expert systems, or knowledge model built from data mining processes, the scalability still remains an open issue. In this study, we propose a knowledge representation model, called an ontology-based knowledge map, which can collect knowledge from different sources, store it, and exploit either directly by stakeholders or as an input to the knowledge discovery process (Data Mining). The proposed model consists of two stages, 1) build an ontology as a knowledge base for a specific domain and data mining concepts, and 2) build the ontology-based knowledge map model for representing and storing the knowledge mined on the crop datasets. A framework of the proposed model has been implemented in agriculture domain. It is an efficient and scalable model, and it can be used as knowledge repository a digital agriculture.
CRAug 13, 2020
Detecting Abnormal Traffic in Large-Scale NetworksMahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev et al.
With the rapid technological advancements, organizations need to rapidly scale up their information technology (IT) infrastructure viz. hardware, software, and services, at a low cost. However, the dynamic growth in the network services and applications creates security vulnerabilities and new risks that can be exploited by various attacks. For example, User to Root (U2R) and Remote to Local (R2L) attack categories can cause a significant damage and paralyze the entire network system. Such attacks are not easy to detect due to the high degree of similarity to normal traffic. While network anomaly detection systems are being widely used to classify and detect malicious traffic, there are many challenges to discover and identify the minority attacks in imbalanced datasets. In this paper, we provide a detailed and systematic analysis of the existing Machine Learning (ML) approaches that can tackle most of these attacks. Furthermore, we propose a Deep Learning (DL) based framework using Long Short Term Memory (LSTM) autoencoder that can accurately detect malicious traffics in network traffic. We perform our experiments in a publicly available dataset of Intrusion Detection Systems (IDSs). We obtain a significant improvement in attack detection, as compared to other benchmarking methods. Hence, our method provides great confidence in securing these networks from malicious traffic.
CRJun 24, 2020
DDoSNet: A Deep-Learning Model for Detecting Network AttacksMahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev et al.
Software-Defined Networking (SDN) is an emerging paradigm, which evolved in recent years to address the weaknesses in traditional networks. The significant feature of the SDN, which is achieved by disassociating the control plane from the data plane, facilitates network management and allows the network to be efficiently programmable. However, the new architecture can be susceptible to several attacks that lead to resource exhaustion and prevent the SDN controller from supporting legitimate users. One of these attacks, which nowadays is growing significantly, is the Distributed Denial of Service (DDoS) attack. DDoS attack has a high impact on crashing the network resources, making the target servers unable to support the valid users. The current methods deploy Machine Learning (ML) for intrusion detection against DDoS attacks in the SDN network using the standard datasets. However, these methods suffer several drawbacks, and the used datasets do not contain the most recent attack patterns - hence, lacking in attack diversity. In this paper, we propose DDoSNet, an intrusion detection system against DDoS attacks in SDN environments. Our method is based on Deep Learning (DL) technique, combining the Recurrent Neural Network (RNN) with autoencoder. We evaluate our model using the newly released dataset CICDDoS2019, which contains a comprehensive variety of DDoS attacks and addresses the gaps of the existing current datasets. We obtain a significant improvement in attack detection, as compared to other benchmarking methods. Hence, our model provides great confidence in securing these networks.
LGDec 3, 2019
Predicting Soil pH by Using Nearest FieldsQuoc Hung Ngo, Nhien-An Le-Khac, Tahar Kechadi
In precision agriculture (PA), soil sampling and testing operation is prior to planting any new crop. It is an expensive operation since there are many soil characteristics to take into account. This paper gives an overview of soil characteristics and their relationships with crop yield and soil profiling. We propose an approach for predicting soil pH based on nearest neighbour fields. It implements spatial radius queries and various regression techniques in data mining. We use soil dataset containing about 4,000 fields profiles to evaluate them and analyse their robustness. A comparative study indicates that LR, SVR, and GBRT techniques achieved high accuracy, with the R_2 values of about 0.718 and MAE values of 0.29. The experimental results showed that the proposed approach is very promising and can contribute significantly to PA.
AIOct 23, 2019
Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data MiningNhien-An Le-Khac, Lamine M. Aouad, M-Tahar Kechadi
Distributed data mining (DDM) deals with the problem of finding patterns or models, called knowledge, in an environment with distributed data and computations. Today, a massive amounts of data which are often geographically distributed and owned by different organisation are being mined. As consequence, a large mount of knowledge are being produced. This causes problems of not only knowledge management but also visualization in data mining. Besides, the main aim of DDM is to exploit fully the benefit of distributed data analysis while minimising the communication. Existing DDM techniques perform partial analysis of local data at individual sites and then generate a global model by aggregating these local results. These two steps are not independent since naive approaches to local analysis may produce an incorrect and ambiguous global data model. The integrating and cooperating of these two steps need an effective knowledge management, concretely an efficient map of knowledge in order to take the advantage of mined knowledge to guide mining the data. In this paper, we present "knowledge map", a representation of knowledge about mined knowledge. This new approach aims to manage efficiently mined knowledge in large scale distributed platform such as Grid. This knowledge map is used to facilitate not only the visualization, evaluation of mining results but also the coordinating of local mining process and existing knowledge to increase the accuracy of final model.
CROct 2, 2019
Machine-Learning Techniques for Detecting Attacks in SDNMahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev et al.
With the advent of Software Defined Networks (SDNs), there has been a rapid advancement in the area of cloud computing. It is now scalable, cheaper, and easier to manage. However, SDNs are more prone to security vulnerabilities as compared to legacy systems. Therefore, machine-learning techniques are now deployed in the SDN infrastructure for the detection of malicious traffic. In this paper, we provide a systematic benchmarking analysis of the existing machine-learning techniques for the detection of malicious traffic in SDNs. We identify the limitations in these classical machine-learning based methods, and lay the foundation for a more robust framework. Our experiments are performed on a publicly available dataset of Intrusion Detection Systems (IDSs).
CRAug 27, 2019
A Security-Aware Access Model for Data-Driven EHR SystemNgoc Hong Tran, Thien-An Nguyen-Ngoc, Nhien-An Le-Khac et al.
Digital healthcare systems are very popular lately, as they provide a variety of helpful means to monitor people's health state as well as to protect people against an unexpected health situation. These systems contain a huge amount of personal information in a form of electronic health records that are not allowed to be disclosed to unauthorized users. Hence, health data and information need to be protected against attacks and thefts. In this paper, we propose a secure distributed architecture for healthcare data storage and analysis. It uses a novel security model to rigorously control permissions of accessing sensitive data in the system, as well as to protect the transmitted data between distributed system servers and nodes. The model also satisfies the NIST security requirements. Thorough experimental results show that the model is very promising.
CVJul 2, 2019
Improving Borderline Adulthood Facial Age Estimation through Ensemble LearningFelix Anda, David Lillis, Aikaterini Kanta et al.
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 years. The weakness of the algorithms specifically in the borderline has been a motivation for this paper. In our approach, we have developed an ensemble technique that improves the accuracy of underage estimation in conjunction with our deep learning model (DS13K) that has been fine-tuned on the Deep Expectation (DEX) model. We have achieved an accuracy of 68% for the age group 16 to 17 years old, which is 4 times better than the DEX accuracy for such age range. We also present an evaluation of existing cloud-based and offline facial age prediction services, such as Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net and DEX.
DBMay 29, 2019
Designing and Implementing Data Warehouse for Agricultural Big DataVuong M. Ngo, Nhien-An Le-Khac, M-Tahar Kechadi
In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.
CRMay 16, 2019
Finding Rats in Cats: Detecting Stealthy Attacks using Group Anomaly DetectionAditya Kuppa, Slawomir Grzonkowski, Muhammad Rizwan Asghar et al.
Advanced attack campaigns span across multiple stages and stay stealthy for long time periods. There is a growing trend of attackers using off-the-shelf tools and pre-installed system applications (such as \emph{powershell} and \emph{wmic}) to evade the detection because the same tools are also used by system administrators and security analysts for legitimate purposes for their routine tasks. To start investigations, event logs can be collected from operational systems; however, these logs are generic enough and it often becomes impossible to attribute a potential attack to a specific attack group. Recent approaches in the literature have used anomaly detection techniques, which aim at distinguishing between malicious and normal behavior of computers or network systems. Unfortunately, anomaly detection systems based on point anomalies are too rigid in a sense that they could miss the malicious activity and classify the attack, not an outlier. Therefore, there is a research challenge to make better detection of malicious activities. To address this challenge, in this paper, we leverage Group Anomaly Detection (GAD), which detects anomalous collections of individual data points. Our approach is to build a neural network model utilizing Adversarial Autoencoder (AAE-$α$) in order to detect the activity of an attacker who leverages off-the-shelf tools and system applications. In addition, we also build \textit{Behavior2Vec} and \textit{Command2Vec} sentence embedding deep learning models specific for feature extraction tasks. We conduct extensive experiments to evaluate our models on real-world datasets collected for a period of two months. The empirical results demonstrate that our approach is effective and robust in discovering targeted attacks, pen-tests, and attack campaigns leveraging custom tools.
CRApr 3, 2019
Leveraging Electromagnetic Side-Channel Analysis for the Investigation of IoT DevicesAsanka Sayakkara, Nhien-An Le-Khac, Mark Scanlon
Internet of Things (IoT) devices have expanded the horizon of digital forensic investigations by providing a rich set of new evidence sources. IoT devices includes health implants, sports wearables, smart burglary alarms, smart thermostats, smart electrical appliances, and many more. Digital evidence from these IoT devices is often extracted from third party sources, e.g., paired smartphone applications or the devices' back-end cloud services. However vital digital evidence can still reside solely on the IoT device itself. The specifics of the IoT device's hardware is a black-box in many cases due to the lack of proven, established techniques to inspect IoT devices. This paper presents a novel methodology to inspect the internal software activities of IoT devices through their electromagnetic radiation emissions during live device investigation. When a running IoT device is identified at a crime scene, forensically important software activities can be revealed through an electromagnetic side-channel analysis (EM-SCA) attack. By using two representative IoT hardware platforms, this work demonstrates that cryptographic algorithms running on high-end IoT devices can be detected with over 82% accuracy, while minor software code differences in low-end IoT devices could be detected over 90% accuracy using a neural network-based classifier. Furthermore, it was experimentally demonstrated that malicious modification of the stock firmware of an IoT device can be detected through machine learning-assisted EM-SCA techniques. These techniques provide a new investigative vector for digital forensic investigators to inspect IoT devices.
CRMar 18, 2019
A Survey of Electromagnetic Side-Channel Attacks and Discussion on their Case-Progressing Potential for Digital ForensicsAsanka Sayakkara, Nhien-An Le-Khac, Mark Scanlon
The increasing prevalence of Internet of Things (IoT) devices has made it inevitable that their pertinence to digital forensic investigations will increase into the foreseeable future. These devices produced by various vendors often posses limited standard interfaces for communication, such as USB ports or WiFi/Bluetooth wireless interfaces. Meanwhile, with an increasing mainstream focus on the security and privacy of user data, built-in encryption is becoming commonplace in consumer-level computing devices, and IoT devices are no exception. Under these circumstances, a significant challenge is presented to digital forensic investigations where data from IoT devices needs to be analysed. This work explores the electromagnetic (EM) side-channel analysis literature for the purpose of assisting digital forensic investigations on IoT devices. EM side-channel analysis is a technique where unintentional electromagnetic emissions are used for eavesdropping on the operations and data handling of computing devices. The non-intrusive nature of EM side-channel approaches makes it a viable option to assist digital forensic investigations as these attacks require, and must result in, no modification to the target device. The literature on various EM side-channel analysis attack techniques are discussed - selected on the basis of their applicability in IoT device investigation scenarios. The insight gained from the background study is used to identify promising future applications of the technique for digital forensic analysis on IoT devices - potentially progressing a wide variety of currently hindered digital investigations.
CRMar 17, 2019
Shining a light on Spotlight: Leveraging Apple's desktop search utility to recover deleted file metadata on macOSTajvinder Singh Atwal, Mark Scanlon, Nhien-An Le-Khac
Spotlight is a proprietary desktop search technology released by Apple in 2004 for its Macintosh operating system Mac OS X 10.4 (Tiger) and remains as a feature in current releases of macOS. Spotlight allows users to search for files or information by querying databases populated with filesystem attributes, metadata, and indexed textual content. Existing forensic research into Spotlight has provided an understanding of the metadata attributes stored within the metadata store database. Current approaches in the literature have also enabled the extraction of metadata records for extant files, but not for deleted files. The objective of this paper is to research the persistence of records for deleted files within Spotlight's metadata store, identify if deleted database pages are recoverable from unallocated space on the volume, and to present a strategy for the processing of discovered records. In this paper, the structure of the metadata store database is outlined, and experimentation reveals that records persist for a period of time within the database but once deleted, are no longer recoverable. The experimentation also demonstrates that deleted pages from the database (containing metadata records) are recoverable from unused space on the filesystem.
LGFeb 21, 2019
Performance study of distributed Apriori-like frequent itemsets miningLamine M. Aouad, Nhien-An Le-Khac, Tahar M. Kechadi
In this article, we focus on distributed Apriori-based frequent itemsets mining. We present a new distributed approach which takes into account inherent characteristics of this algorithm. We study the distribution aspect of this algorithm and give a comparison of the proposed approach with a classical Apriori-like distributed algorithm, using both analytical and experimental studies. We find that under a wide range of conditions and datasets, the performance of a distributed Apriori-like algorithm is not related to global strategies of pruning since the performance of the local Apriori generation is usually characterized by relatively high success rates of candidate sets frequency at low levels which switch to very low rates at some stage, and often drops to zero. This means that the intermediate communication steps and remote support counts computation and collection in classical distributed schemes are computationally inefficient locally, and then constrains the global performance. Our performance evaluation is done on a large cluster of workstations using the Condor system and its workflow manager DAGMan. The results show that the presented approach greatly enhances the performance and achieves good scalability compared to a typical distributed Apriori founded algorithm.
IRNov 16, 2018
Ontology based Approach for Precision AgricultureQuoc Hung Ngo, Nhien-An Le-Khac, Tahar Kechadi
In this paper, we propose a framework of knowledge for an agriculture ontology which can be used for the purpose of smart agriculture systems. This ontology not only includes basic concepts in the agricultural domain but also contains geographical, IoT, business subdomains, and other knowledge extracted from various datasets. With this ontology, any users can easily understand agricultural data links between them collected from many different data resources. In our experiment, we also import country, sub-country and disease entities into this ontology as basic entities for building agricultural linked datasets later.
CRJul 22, 2018
Digital forensic investigation of two-way radio communication equipment and servicesArie Kouwen, Mark Scanlon, Kim-Kwang Raymond Choo et al.
Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of these devices also integrate with smartphones, which delivers Push-To-Talk services that make it possible to setup connections between users using a two-way radio and a smartphone. In fact, these devices can be used to connect users only using smartphones. To date, there is little research on the digital traces in modern radio communication equipment. In fact, increasing the knowledge base about these radio communication devices and services can be valuable to law enforcement in a police investigation. In this paper, we investigate what kind of radio communication equipment and services law enforcement digital investigators can encounter at a crime scene or in an investigation. Subsequent to seizure of this radio communication equipment we explore the traces, which may have a forensic interest and how these traces can be acquired. Finally, we test our approach on sample radio communication equipment and services.
CRApr 23, 2018
Forensic Analysis of the exFAT artefactsYves Vandermeer, Nhien-An Le-Khac, Joe Carthy et al.
Although keeping some basic concepts inherited from FAT32, the exFAT file system introduces many differences, such as the new mapping scheme of directory entries. The combination of exFAT mapping scheme with the allocation of bitmap files and the use of FAT leads to new forensic possibilities. The recovery of deleted files, including fragmented ones and carving becomes more accurate compared with former forensic processes. Nowadays, the accurate and sound forensic analysis is more than ever needed, as there is a high risk of erroneous interpretation. Indeed, most of the related work in the literature on exFAT structure and forensics, is mainly based on reverse engineering research, and only few of them cover the forensic interpretation. In this paper, we propose a new methodology using of exFAT file systems features to improve the interpretation of inactive entries by using bitmap file analysis and recover the file system metadata information for carved files. Experimental results show how our approach improves the forensic interpretation accuracy.
CRApr 23, 2018
Unmanned Aerial Vehicle Forensic Investigation Process: Dji Phantom 3 Drone As A Case StudyAlan Roder, Kim-Kwang Raymon Choo, Nhien-An Le-Khac
Drones (also known as Unmanned Aerial Vehicles, UAVs) is a potential source of evidence in a digital investigation, partly due to their increasing popularity in our society. However, existing UAV/drone forensics generally rely on conventional digital forensic investigation guidelines such as those of ACPO and NIST, which may not be entirely fit_for_purpose. In this paper, we identify the challenges associated with UAV/drone forensics. We then explore and evaluate existing forensic guidelines, in terms of their effectiveness for UAV/drone forensic investigations. Next, we present our set of guidelines for UAV/drone investigations. Finally, we demonstrate how the proposed guidelines can be used to guide a drone forensic investigation using the DJI Phantom 3 drone as a case study.
DBFeb 1, 2018
Hierarchical Aggregation Approach for Distributed clustering of spatial datasetsMalika Bendechache, Nhien-An Le-Khac, M-Tahar Kechadi
In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each node performs a clustering on its local data, 2) aggregation phase, where the local clusters are aggregated to produce global clusters. This approach is characterised by the fact that the local clusters are represented in a simple and efficient way. And The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in both response time and memory allocation. We evaluated the approach with different datasets and compared it to well-known clustering techniques. The experimental results show that our approach is very promising and outperforms all those algorithms
CRJan 31, 2018
Internet of things forensics: Challenges and Case StudySaad Alabdulsalam, Kevin Schaefer, Tahar Kechadi et al.
Today is the era of Internet of Things (IoT), millions of machines such as cars, smoke detectors, watches, glasses, webcams, etc. are being connected to the Internet. The number of machines that possess the ability of remote access to monitor and collect data is continuously increasing. This development makes, on one hand, the human life more comfort- able, convenient, but it also raises on other hand issues on security and privacy. However, this development also raises challenges for the digital investigator when IoT devices involve in criminal scenes. Indeed, current research in the literature focuses on security and privacy for IoT environments rather than methods or techniques of forensic acquisition and analysis for IoT devices. Therefore, in this paper, we discuss firstly different aspects related to IoT forensics and then focus on the cur- rent challenges. We also describe forensic approaches for a IoT device smartwatch as a case study. We analyze forensic artifacts retrieved from smartwatch devices and discuss on evidence found aligned with challenges in IoT forensics
LGJan 31, 2018
One-class Collective Anomaly Detection based on Long Short-Term Memory Recurrent Neural NetworksNga Nguyen Thi, Van Loi Cao, Nhien-An Le-Khac
Intrusion detection for computer network systems has been becoming one of the most critical tasks for network administrators today. It has an important role for organizations, governments and our society due to the valuable resources hosted on computer networks. Traditional misuse detection strategies are unable to detect new and unknown intrusion types. In contrast, anomaly detection in network security aims to distinguish between illegal or malicious events and normal behavior of network systems. Anomaly detection can be considered as a classification problem where it builds models of normal network behavior, of which it uses to detect new patterns that significantly deviate from the model. Most of the current approaches on anomaly detection is based on the learning of normal behavior and anomalous actions. They do not include memory that is they do not take into account previous events classify new ones. In this paper, we propose a one class collective anomaly detection model based on neural network learning. Normally a Long Short Term Memory Recurrent Neural Network (LSTM RNN) is trained only on normal data, and it is capable of predicting several time steps ahead of an input. In our approach, a LSTM RNN is trained on normal time series data before performing a prediction for each time step. Instead of considering each time-step separately, the observation of prediction errors from a certain number of time-steps is now proposed as a new idea for detecting collective anomalies. The prediction errors of a certain number of the latest time-steps above a threshold will indicate a collective anomaly. The model is evaluated on a time series version of the KDD 1999 dataset. The experiments demonstrate that the proposed model is capable to detect collective anomaly efficiently
CRDec 15, 2017
Network Intell: Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network TrafficErwin van de Wiel, Mark Scanlon, Nhien-An Le-Khac
In criminal investigations, telecommunication wiretaps have become a common technique used by law enforcement. While phone-based wiretapping is well documented and the procedure for their execution are well known, the same cannot be said for Internet taps. Lawfully intercepted network traffic often contains a lot of encrypted traffic making it increasingly difficult to find useful information inside the traffic captured. The advent of Internet-of-Things further complicates the process for non-technical investigators. The current level of complexity of intercepted network traffic is close to a point where data cannot be analysed without supervision of a digital investigator with advanced network knowledge. Current investigations focus on analysing all traffic in a chronological manner and are predominately conducted on the data contents of the intercepted traffic. This approach often becomes overly arduous when the amount of data to be analysed becomes very large. In this paper, we propose a novel approach to analyse large amounts of intercepted network traffic based on network metadata. Our approach significantly reduces the duration of the analysis and also produces an insight view of analysing results for the non-technical investigator. We also test our approach with a large sample of network traffic data.
CROct 26, 2017
Privacy Preserving Internet Browsers: Forensic Analysis of BrowzarChristopher Warren, Eman El-Sheikh, Nhien-An Le-Khac
With the advance of technology, Criminal Justice agencies are being confronted with an increased need to investigate crimes perpetuated partially or entirely over the Internet. These types of crime are known as cybercrimes. In order to conceal illegal online activity, criminals often use private browsing features or browsers designed to provide total browsing privacy. The use of private browsing is a common challenge faced in for example child exploitation investigations, which usually originate on the Internet. Although private browsing features are not designed specifically for criminal activity, they have become a valuable tool for criminals looking to conceal their online activity. As such, Technological Crime units often focus their forensic analysis on thoroughly examining the web history on a computer. Private browsing features and browsers often require a more in-depth, post mortem analysis. This often requires the use of multiple tools, as well as different forensic approaches to uncover incriminating evidence. This evidence may be required in a court of law, where analysts are often challenged both on their findings and on the tools and approaches used to recover evidence. However, there are very few research on evaluating of private browsing in terms of privacy preserving as well as forensic acquisition and analysis of privacy preserving internet browsers. Therefore in this chapter, we firstly review the private mode of popular internet browsers. Next, we describe the forensic acquisition and analysis of Browzar, a privacy preserving internet browser and compare it with other popular internet browsers
CRAug 29, 2017
Increasing digital investigator availability through efficient workflow management and automationRonald In de Braekt, Nhien-An Le-Khac, Jason Farina et al.
The growth of digital storage capacities and diversity devices has had a significant time impact on digital forensic laboratories in law enforcement. Backlogs have become commonplace and increasingly more time is spent in the acquisition and preparation steps of an investigation as opposed to detailed evidence analysis and reporting. There is generally little room for increasing digital investigation capacity in law enforcement digital forensic units and the allocated budgets for these units are often decreasing. In the context of developing an efficient investigation process, one of the key challenges amounts to how to achieve more with less. This paper proposes a workflow management automation framework for handling common digital forensic tools. The objective is to streamline the digital investigation workflow - enabling more efficient use of limited hardware and software. The proposed automation framework reduces the time digital forensic experts waste conducting time-consuming, though necessary, tasks. The evidence processing time is decreased through server-side automation resulting in 24/7 evidence preparation. The proposed framework increases efficiency of use of forensic software and hardware, reduces the infrastructure costs and license fees, and simplifies the preparation steps for the digital investigator. The proposed approach is evaluated in a real-world scenario to evaluate its robustness and highlight its benefits.
CRAug 29, 2017
Investigation and Automating Extraction of Thumbnails Produced by Image viewersWybren van der Meer, Kim-Kwang Raymond Choo, Nhien-An Le-Khac et al.
Today, in digital forensics, images normally provide important information within an investigation. However, not all images may still be available within a forensic digital investigation as they were all deleted for example. Data carving can be used in this case to retrieve deleted images but the carving time is normally significant and these images can be moreover overwritten by other data. One of the solutions is to look at thumbnails of images that are no longer available. These thumbnails can often be found within databases created by either operating systems or image viewers. In literature, most research and practical focus on the extraction of thumbnails from databases created by the operating system. There is a little research working on the thumbnails created by the image reviewers as these thumbnails are application-driven in terms of pre-defined sizes, adjustments and storage location. Eventually, thumbnail databases from image viewers are significant forensic artefacts for investigators as these programs deal with large amounts of images. However, investigating these databases so far is still manual or semi-automatic task that leads to the huge amount of forensic time. Therefore, in this paper we propose a new approach of automating extraction of thumbnails produced by image viewers. We also test our approach with popular image viewers in different storage structures and locations to show its robustness.
CRAug 5, 2017
Private Web Browser Forensics: A Case Study of the Epic Privacy BrowserAlan Reed, Mark Scanlon, Nhien-An Le-Khac
Organised crime, as well as individual criminals, is benefiting from the protection of private browsers provide to those who would carry out illegal activity, such as money laundering, drug trafficking, the online exchange of child-abuse material, etc. The protection afforded to users of the Epic Privacy Browser illustrates these benefits. This browser is currently in use in approximately 180 countries worldwide. This paper outlines the location and type of evidence available through live and post-mortem state analyses of the Epic Privacy Browser. This study identifies the manner in which the browser functions during use, where evidence can be recovered after use, as well as the tools and effective presentation of the recovered material.
CRAug 5, 2017
Integration of Ether Unpacker into Ragpicker for plugin-based Malware Analysis and IdentificationErik Schaefer, Nhien-An Le-Khac, Mark Scanlon
Malware is a pervasive problem in both personal computing devices and distributed computing systems. Identification of malware variants and their families others a great benefit in early detection resulting in a reduction of the analyses time needed. In order to classify malware, most of the current approaches are based on the analysis of the unpacked and unencrypted binaries. However, most of the unpacking solutions in the literature have a low unpacking rate. This results in a low contribution towards the identification of transferred code and re-used code. To develop a new malware analysis solution based on clusters of binary code sections, it is required to focus on increasing of the unpacking rate of malware samples to extend the underlying code database. In this paper, we present a new approach of analysing malware by integrating Ether Unpacker into the plugin-based malware analysis tool, Ragpicker. We also evaluate our approach against real-world malware patterns.
CRAug 5, 2017
Evaluation of Digital Forensic Process Models with Respect to Digital Forensics as a ServiceXiaoyu Du, Nhien-An Le-Khac, Mark Scanlon
Digital forensic science is very much still in its infancy, but is becoming increasingly invaluable to investigators. A popular area for research is seeking a standard methodology to make the digital forensic process accurate, robust, and efficient. The first digital forensic process model proposed contains four steps: Acquisition, Identification, Evaluation and Admission. Since then, numerous process models have been proposed to explain the steps of identifying, acquiring, analysing, storage, and reporting on the evidence obtained from various digital devices. In recent years, an increasing number of more sophisticated process models have been proposed. These models attempt to speed up the entire investigative process or solve various of problems commonly encountered in the forensic investigation. In the last decade, cloud computing has emerged as a disruptive technological concept, and most leading enterprises such as IBM, Amazon, Google, and Microsoft have set up their own cloud-based services. In the field of digital forensic investigation, moving to a cloud-based evidence processing model would be extremely beneficial and preliminary attempts have been made in its implementation. Moving towards a Digital Forensics as a Service model would not only expedite the investigative process, but can also result in significant cost savings - freeing up digital forensic experts and law enforcement personnel to progress their caseload. This paper aims to evaluate the applicability of existing digital forensic process models and analyse how each of these might apply to a cloud-based evidence processing paradigm.
CRAug 5, 2017
Privileged Data within Digital EvidenceDominique Fleurbaaij, Mark Scanlon, Nhien-An Le-Khac
In recent years the use of digital communication has increased. This also increased the chance to find privileged data in the digital evidence. Privileged data is protected by law from viewing by anyone other than the client. It is up to the digital investigator to handle this privileged data properly without being able to view the contents. Procedures on handling this information are available, but do not provide any practical information nor is it known how effective filtering is. The objective of this paper is to describe the handling of privileged data in the current digital forensic tools and the creation of a script within the digital forensic tool Nuix. The script automates the handling of privileged data to minimize the exposure of the contents to the digital investigator. The script also utilizes technology within Nuix that extends the automated search of identical privileged document to relate files based on their contents. A comparison of the 'traditional' ways of filtering within the digital forensic tools and the script written in Nuix showed that digital forensic tools are still limited when used on privileged data. The script manages to increase the effectiveness as direct result of the use of relations based on file content.
DCApr 11, 2017
Feature Selection Parallel Technique for Remotely Sensed Imagery ClassificationNhien-An Le-Khac, M-Tahar Kechadi, Bo Wu et al.
Remote sensing research focusing on feature selection has long attracted the attention of the remote sensing community because feature selection is a prerequisite for image processing and various applications. Different feature selection methods have been proposed to improve the classification accuracy. They vary from basic search techniques to clonal selections, and various optimal criteria have been investigated. Recently, methods using dependence-based measures have attracted much attention due to their ability to deal with very high dimensional datasets. However, these methods are based on Cramers V test, which has performance issues with large datasets. In this paper, we propose a parallel approach to improve their performance. We evaluate our approach on hyper-spectral and high spatial resolution images and compare it to the proposed methods with a centralized version as preliminary results. The results are very promising.