CRJul 10, 2023
ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The UnknownMark Scanlon, Frank Breitinger, Christopher Hargreaves et al.
The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances.
CVDec 17, 2025Code
VAAS: Vision-Attention Anomaly Scoring for Image Manipulation Detection in Digital ForensicsOpeyemi Bamigbade, Mark Scanlon, John Sheppard
Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations. Modern generative models can produce visually consistent forgeries that evade traditional detectors based on pixel or compression artefacts. Most existing approaches also lack an explicit measure of anomaly intensity, which limits their ability to quantify the severity of manipulation. This paper introduces Vision-Attention Anomaly Scoring (VAAS), a novel dual-module framework that integrates global attention-based anomaly estimation using Vision Transformers (ViT) with patch-level self-consistency scoring derived from SegFormer embeddings. The hybrid formulation provides a continuous and interpretable anomaly score that reflects both the location and degree of manipulation. Evaluations on the DF2023 and CASIA v2.0 datasets demonstrate that VAAS achieves competitive F1 and IoU performance, while enhancing visual explainability through attention-guided anomaly maps. The framework bridges quantitative detection with human-understandable reasoning, supporting transparent and reliable image integrity assessment. The source code for all experiments and corresponding materials for reproducing the results are available open source.
CVDec 18, 2025Code
Plug to Place: Indoor Multimedia Geolocation from Electrical Sockets for Digital InvestigationKanwal Aftab, Graham Adams, Mark Scanlon
Computer vision is a rapidly evolving field, giving rise to powerful new tools and techniques in digital forensic investigation, and shows great promise for novel digital forensic applications. One such application, indoor multimedia geolocation, has the potential to become a crucial aid for law enforcement in the fight against human trafficking, child exploitation, and other serious crimes. While outdoor multimedia geolocation has been widely explored, its indoor counterpart remains underdeveloped due to challenges such as similar room layouts, frequent renovations, visual ambiguity, indoor lighting variability, unreliable GPS signals, and limited datasets in sensitive domains. This paper introduces a pipeline that uses electric sockets as consistent indoor markers for geolocation, since plug socket types are standardised by country or region. The three-stage deep learning pipeline detects plug sockets (YOLOv11, mAP@0.5 = 0.843), classifies them into one of 12 plug socket types (Xception, accuracy = 0.912), and maps the detected socket types to countries (accuracy = 0.96 at >90% threshold confidence). To address data scarcity, two dedicated datasets were created: socket detection dataset of 2,328 annotated images expanded to 4,072 through augmentation, and a classification dataset of 3,187 images across 12 plug socket classes. The pipeline was evaluated on the Hotels-50K dataset, focusing on the TraffickCam subset of crowd-sourced hotel images, which capture real-world conditions such as poor lighting and amateur angles. This dataset provides a more realistic evaluation than using professional, well-lit, often wide-angle images from travel websites. This framework demonstrates a practical step toward real-world digital forensic applications. The code, trained models, and the data for this paper are available open source.
CRDec 2, 2020Code
Smarter Password Guessing Techniques Leveraging Contextual Information and OSINTAikaterini Kanta, Iwen Coisel, Mark Scanlon
In recent decades, criminals have increasingly used the web to research, assist and perpetrate criminal behaviour. One of the most important ways in which law enforcement can battle this growing trend is through accessing pertinent information about suspects in a timely manner. A significant hindrance to this is the difficulty of accessing any system a suspect uses that requires authentication via password. Password guessing techniques generally consider common user behaviour while generating their passwords, as well as the password policy in place. Such techniques can offer a modest success rate considering a large/average population. However, they tend to fail when focusing on a single target -- especially when the latter is an educated user taking precautions as a savvy criminal would be expected to do. Open Source Intelligence is being increasingly leveraged by Law Enforcement in order to gain useful information about a suspect, but very little is currently being done to integrate this knowledge in an automated way within password cracking. The purpose of this research is to delve into the techniques that enable the gathering of the necessary context about a suspect and find ways to leverage this information within password guessing techniques.
MMDec 2, 2020Code
Retracing the Flow of the Stream: Investigating Kodi Streaming ServicesSamuel Todd Bromley, John Sheppard, Mark Scanlon et al.
Kodi is of one of the world's largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual abuse organizations publishing/streaming child abuse material to an international paying clientele. Open source software used for viewing videos from the Internet, such as Kodi, is being exploited by criminals to conduct their activities. In this paper, we describe a new method to quickly locate Kodi artifacts and gather information for a successful prosecution. We also evaluate our approach on different platforms; Windows, Android and Linux. Our experiments show the file location, artifacts and a history of viewed content including their locations from the Internet. Our approach will serve as a resource to forensic investigators to examine Kodi or similar streaming platforms.
CRFeb 29, 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation EfficiencyAkila Wickramasekara, Frank Breitinger, Mark Scanlon
The ever-increasing workload of digital forensic labs raises concerns about law enforcement's ability to conduct both cyber-related and non-cyber-related investigations promptly. Consequently, this article explores the potential and usefulness of integrating Large Language Models (LLMs) into digital forensic investigations to address challenges such as bias, explainability, censorship, resource-intensive infrastructure, and ethical and legal considerations. A comprehensive literature review is carried out, encompassing existing digital forensic models, tools, LLMs, deep learning techniques, and the use of LLMs in investigations. The review identifies current challenges within existing digital forensic processes and explores both the obstacles and the possibilities of incorporating LLMs. In conclusion, the study states that the adoption of LLMs in digital forensics, with appropriate constraints, has the potential to improve investigation efficiency, improve traceability, and alleviate the technical and judicial barriers faced by law enforcement entities.
CVFeb 23, 2024
Computer Vision for Multimedia Geolocation in Human Trafficking Investigation: A Systematic Literature ReviewOpeyemi Bamigbade, John Sheppard, Mark Scanlon
The task of multimedia geolocation is becoming an increasingly essential component of the digital forensics toolkit to effectively combat human trafficking, child sexual exploitation, and other illegal acts. Typically, metadata-based geolocation information is stripped when multimedia content is shared via instant messaging and social media. The intricacy of geolocating, geotagging, or finding geographical clues in this content is often overly burdensome for investigators. Recent research has shown that contemporary advancements in artificial intelligence, specifically computer vision and deep learning, show significant promise towards expediting the multimedia geolocation task. This systematic literature review thoroughly examines the state-of-the-art leveraging computer vision techniques for multimedia geolocation and assesses their potential to expedite human trafficking investigation. This includes a comprehensive overview of the application of computer vision-based approaches to multimedia geolocation, identifies their applicability in combating human trafficking, and highlights the potential implications of enhanced multimedia geolocation for prosecuting human trafficking. 123 articles inform this systematic literature review. The findings suggest numerous potential paths for future impactful research on the subject.
CRDec 2, 2020
Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline EventsXiaoyu Du, Quan Le, Mark Scanlon
Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications can facilitate an opportunity for training automated artificial intelligence based evidence processing systems. These can significantly aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination building on the growing trend towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered pertinent files to classify newly discovered files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique generates a relevancy score for file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios.
CVDec 2, 2020
Assessing the Influencing Factors on the Accuracy of Underage Facial Age EstimationFelix Anda, Brett A. Becker, David Lillis et al.
Swift response to the detection of endangered minors is an ongoing concern for law enforcement. Many child-focused investigations hinge on digital evidence discovery and analysis. Automated age estimation techniques are needed to aid in these investigations to expedite this evidence discovery process, and decrease investigator exposure to traumatic material. Automated techniques also show promise in decreasing the overflowing backlog of evidence obtained from increasing numbers of devices and online services. A lack of sufficient training data combined with natural human variance has been long hindering accurate automated age estimation -- especially for underage subjects. This paper presented a comprehensive evaluation of the performance of two cloud age estimation services (Amazon Web Service's Rekognition service and Microsoft Azure's Face API) against a dataset of over 21,800 underage subjects. The objective of this work is to evaluate the influence that certain human biometric factors, facial expressions, and image quality (i.e. blur, noise, exposure and resolution) have on the outcome of automated age estimation services. A thorough evaluation allows us to identify the most influential factors to be overcome in future age estimation systems.
CRDec 2, 2020
SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic InvestigationXiaoyu Du, Chris Hargreaves, John Sheppard et al.
Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based tools and approaches in digital forensics. Automated evidence processing leveraging artificial intelligence based techniques shows great promise in expediting the digital forensic analysis process while increasing case processing capacities. For each application of artificial intelligence highlighted, a number of current challenges and future potential impact is discussed.
CVJul 2, 2019
Improving Borderline Adulthood Facial Age Estimation through Ensemble LearningFelix Anda, David Lillis, Aikaterini Kanta et al.
Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets have been employed to measure the mean absolute error (MAE) ranging between 1.47 to 8 years. The weakness of the algorithms specifically in the borderline has been a motivation for this paper. In our approach, we have developed an ensemble technique that improves the accuracy of underage estimation in conjunction with our deep learning model (DS13K) that has been fine-tuned on the Deep Expectation (DEX) model. We have achieved an accuracy of 68% for the age group 16 to 17 years old, which is 4 times better than the DEX accuracy for such age range. We also present an evaluation of existing cloud-based and offline facial age prediction services, such as Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net and DEX.
CRJul 2, 2019
Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic ArtefactsXiaoyu Du, Mark Scanlon
The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on seized devices are not pertinent to the investigation. Manually retrieving suspicious files relevant to the investigation is akin to finding a needle in a haystack. In this paper, a methodology for the automatic prioritisation of suspicious file artefacts (i.e., file artefacts that are pertinent to the investigation) is proposed to reduce the manual analysis effort required. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is likely to be suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion.
CRApr 3, 2019
Leveraging Electromagnetic Side-Channel Analysis for the Investigation of IoT DevicesAsanka Sayakkara, Nhien-An Le-Khac, Mark Scanlon
Internet of Things (IoT) devices have expanded the horizon of digital forensic investigations by providing a rich set of new evidence sources. IoT devices includes health implants, sports wearables, smart burglary alarms, smart thermostats, smart electrical appliances, and many more. Digital evidence from these IoT devices is often extracted from third party sources, e.g., paired smartphone applications or the devices' back-end cloud services. However vital digital evidence can still reside solely on the IoT device itself. The specifics of the IoT device's hardware is a black-box in many cases due to the lack of proven, established techniques to inspect IoT devices. This paper presents a novel methodology to inspect the internal software activities of IoT devices through their electromagnetic radiation emissions during live device investigation. When a running IoT device is identified at a crime scene, forensically important software activities can be revealed through an electromagnetic side-channel analysis (EM-SCA) attack. By using two representative IoT hardware platforms, this work demonstrates that cryptographic algorithms running on high-end IoT devices can be detected with over 82% accuracy, while minor software code differences in low-end IoT devices could be detected over 90% accuracy using a neural network-based classifier. Furthermore, it was experimentally demonstrated that malicious modification of the stock firmware of an IoT device can be detected through machine learning-assisted EM-SCA techniques. These techniques provide a new investigative vector for digital forensic investigators to inspect IoT devices.
CRMar 18, 2019
A Survey of Electromagnetic Side-Channel Attacks and Discussion on their Case-Progressing Potential for Digital ForensicsAsanka Sayakkara, Nhien-An Le-Khac, Mark Scanlon
The increasing prevalence of Internet of Things (IoT) devices has made it inevitable that their pertinence to digital forensic investigations will increase into the foreseeable future. These devices produced by various vendors often posses limited standard interfaces for communication, such as USB ports or WiFi/Bluetooth wireless interfaces. Meanwhile, with an increasing mainstream focus on the security and privacy of user data, built-in encryption is becoming commonplace in consumer-level computing devices, and IoT devices are no exception. Under these circumstances, a significant challenge is presented to digital forensic investigations where data from IoT devices needs to be analysed. This work explores the electromagnetic (EM) side-channel analysis literature for the purpose of assisting digital forensic investigations on IoT devices. EM side-channel analysis is a technique where unintentional electromagnetic emissions are used for eavesdropping on the operations and data handling of computing devices. The non-intrusive nature of EM side-channel approaches makes it a viable option to assist digital forensic investigations as these attacks require, and must result in, no modification to the target device. The literature on various EM side-channel analysis attack techniques are discussed - selected on the basis of their applicability in IoT device investigation scenarios. The insight gained from the background study is used to identify promising future applications of the technique for digital forensic analysis on IoT devices - potentially progressing a wide variety of currently hindered digital investigations.
CRMar 17, 2019
Shining a light on Spotlight: Leveraging Apple's desktop search utility to recover deleted file metadata on macOSTajvinder Singh Atwal, Mark Scanlon, Nhien-An Le-Khac
Spotlight is a proprietary desktop search technology released by Apple in 2004 for its Macintosh operating system Mac OS X 10.4 (Tiger) and remains as a feature in current releases of macOS. Spotlight allows users to search for files or information by querying databases populated with filesystem attributes, metadata, and indexed textual content. Existing forensic research into Spotlight has provided an understanding of the metadata attributes stored within the metadata store database. Current approaches in the literature have also enabled the extraction of metadata records for extant files, but not for deleted files. The objective of this paper is to research the persistence of records for deleted files within Spotlight's metadata store, identify if deleted database pages are recoverable from unallocated space on the volume, and to present a strategy for the processing of discovered records. In this paper, the structure of the metadata store database is outlined, and experimentation reveals that records persist for a period of time within the database but once deleted, are no longer recoverable. The experimentation also demonstrates that deleted pages from the database (containing metadata records) are recoverable from unused space on the filesystem.
CRJul 22, 2018
Deep learning at the shallow end: Malware classification for non-domain expertsQuan Le, Oisín Boydell, Brian Mac Namee et al.
Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.
CRJul 22, 2018
Digital forensic investigation of two-way radio communication equipment and servicesArie Kouwen, Mark Scanlon, Kim-Kwang Raymond Choo et al.
Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of these devices also integrate with smartphones, which delivers Push-To-Talk services that make it possible to setup connections between users using a two-way radio and a smartphone. In fact, these devices can be used to connect users only using smartphones. To date, there is little research on the digital traces in modern radio communication equipment. In fact, increasing the knowledge base about these radio communication devices and services can be valuable to law enforcement in a police investigation. In this paper, we investigate what kind of radio communication equipment and services law enforcement digital investigators can encounter at a crime scene or in an investigation. Subsequent to seizure of this radio communication equipment we explore the traces, which may have a forensic interest and how these traces can be acquired. Finally, we test our approach on sample radio communication equipment and services.
CRDec 15, 2017
Network Intell: Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network TrafficErwin van de Wiel, Mark Scanlon, Nhien-An Le-Khac
In criminal investigations, telecommunication wiretaps have become a common technique used by law enforcement. While phone-based wiretapping is well documented and the procedure for their execution are well known, the same cannot be said for Internet taps. Lawfully intercepted network traffic often contains a lot of encrypted traffic making it increasingly difficult to find useful information inside the traffic captured. The advent of Internet-of-Things further complicates the process for non-technical investigators. The current level of complexity of intercepted network traffic is close to a point where data cannot be analysed without supervision of a digital investigator with advanced network knowledge. Current investigations focus on analysing all traffic in a chronological manner and are predominately conducted on the data contents of the intercepted traffic. This approach often becomes overly arduous when the amount of data to be analysed becomes very large. In this paper, we propose a novel approach to analyse large amounts of intercepted network traffic based on network metadata. Our approach significantly reduces the duration of the analysis and also produces an insight view of analysing results for the non-technical investigator. We also test our approach with a large sample of network traffic data.
CRDec 12, 2017
Hierarchical Bloom Filter Trees for Approximate MatchingDavid Lillis, Frank Breitinger, Mark Scanlon
Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.
CRDec 10, 2017
Study of Peer-to-Peer Network Based Cybercrime Investigation: Application on Botnet TechnologiesMark Scanlon
The scalable, low overhead attributes of Peer-to-Peer (P2P) Internet protocols and networks lend themselves well to being exploited by criminals to execute a large range of cybercrimes. The types of crimes aided by P2P technology include copyright infringement, sharing of illicit images of children, fraud, hacking/cracking, denial of service attacks and virus/malware propagation through the use of a variety of worms, botnets, malware, viruses and P2P file sharing. This project is focused on study of active P2P nodes along with the analysis of the undocumented communication methods employed in many of these large unstructured networks. This is achieved through the design and implementation of an efficient P2P monitoring and crawling toolset. The requirement for investigating P2P based systems is not limited to the more obvious cybercrimes listed above, as many legitimate P2P based applications may also be pertinent to a digital forensic investigation, e.g, voice over IP, instant messaging, etc. Investigating these networks has become increasingly difficult due to the broad range of network topologies and the ever increasing and evolving range of P2P based applications. In this work we introduce the Universal P2P Network Investigation Framework (UP2PNIF), a framework which enables significantly faster and less labour intensive investigation of newly discovered P2P networks through the exploitation of the commonalities in P2P network functionality. In combination with a reference database of known network characteristics, it is envisioned that any known P2P network can be instantly investigated using the framework, which can intelligently determine the best investigation methodology and greatly expedite the evidence gathering process. A proof of concept tool was developed for conducting investigations on the BitTorrent network.
CRDec 7, 2017
Enabling the Remote Acquisition of Digital Forensic Evidence through Secure Data Transmission and VerificationMark Scanlon
Providing the ability to any law enforcement officer to remotely transfer an image from any suspect computer directly to a forensic laboratory for analysis, can only help to greatly reduce the time wasted by forensic investigators in conducting on-site collection of computer equipment. RAFT (Remote Acquisition Forensic Tool) is a system designed to facilitate forensic investigators by remotely gathering digital evidence. This is achieved through the implementation of a secure, verifiable client/server imaging architecture. The RAFT system is designed to be relatively easy to use, requiring minimal technical knowledge on behalf of the user. One of the key focuses of RAFT is to ensure that the evidence it gathers remotely is court admissible. This is achieved by ensuring that the image taken using RAFT is verified to be identical to the original evidence on a suspect computer.
CRAug 29, 2017
Increasing digital investigator availability through efficient workflow management and automationRonald In de Braekt, Nhien-An Le-Khac, Jason Farina et al.
The growth of digital storage capacities and diversity devices has had a significant time impact on digital forensic laboratories in law enforcement. Backlogs have become commonplace and increasingly more time is spent in the acquisition and preparation steps of an investigation as opposed to detailed evidence analysis and reporting. There is generally little room for increasing digital investigation capacity in law enforcement digital forensic units and the allocated budgets for these units are often decreasing. In the context of developing an efficient investigation process, one of the key challenges amounts to how to achieve more with less. This paper proposes a workflow management automation framework for handling common digital forensic tools. The objective is to streamline the digital investigation workflow - enabling more efficient use of limited hardware and software. The proposed automation framework reduces the time digital forensic experts waste conducting time-consuming, though necessary, tasks. The evidence processing time is decreased through server-side automation resulting in 24/7 evidence preparation. The proposed framework increases efficiency of use of forensic software and hardware, reduces the infrastructure costs and license fees, and simplifies the preparation steps for the digital investigator. The proposed approach is evaluated in a real-world scenario to evaluate its robustness and highlight its benefits.
CRAug 5, 2017
Private Web Browser Forensics: A Case Study of the Epic Privacy BrowserAlan Reed, Mark Scanlon, Nhien-An Le-Khac
Organised crime, as well as individual criminals, is benefiting from the protection of private browsers provide to those who would carry out illegal activity, such as money laundering, drug trafficking, the online exchange of child-abuse material, etc. The protection afforded to users of the Epic Privacy Browser illustrates these benefits. This browser is currently in use in approximately 180 countries worldwide. This paper outlines the location and type of evidence available through live and post-mortem state analyses of the Epic Privacy Browser. This study identifies the manner in which the browser functions during use, where evidence can be recovered after use, as well as the tools and effective presentation of the recovered material.
CRAug 5, 2017
Integration of Ether Unpacker into Ragpicker for plugin-based Malware Analysis and IdentificationErik Schaefer, Nhien-An Le-Khac, Mark Scanlon
Malware is a pervasive problem in both personal computing devices and distributed computing systems. Identification of malware variants and their families others a great benefit in early detection resulting in a reduction of the analyses time needed. In order to classify malware, most of the current approaches are based on the analysis of the unpacked and unencrypted binaries. However, most of the unpacking solutions in the literature have a low unpacking rate. This results in a low contribution towards the identification of transferred code and re-used code. To develop a new malware analysis solution based on clusters of binary code sections, it is required to focus on increasing of the unpacking rate of malware samples to extend the underlying code database. In this paper, we present a new approach of analysing malware by integrating Ether Unpacker into the plugin-based malware analysis tool, Ragpicker. We also evaluate our approach against real-world malware patterns.
CRAug 5, 2017
Evaluation of Digital Forensic Process Models with Respect to Digital Forensics as a ServiceXiaoyu Du, Nhien-An Le-Khac, Mark Scanlon
Digital forensic science is very much still in its infancy, but is becoming increasingly invaluable to investigators. A popular area for research is seeking a standard methodology to make the digital forensic process accurate, robust, and efficient. The first digital forensic process model proposed contains four steps: Acquisition, Identification, Evaluation and Admission. Since then, numerous process models have been proposed to explain the steps of identifying, acquiring, analysing, storage, and reporting on the evidence obtained from various digital devices. In recent years, an increasing number of more sophisticated process models have been proposed. These models attempt to speed up the entire investigative process or solve various of problems commonly encountered in the forensic investigation. In the last decade, cloud computing has emerged as a disruptive technological concept, and most leading enterprises such as IBM, Amazon, Google, and Microsoft have set up their own cloud-based services. In the field of digital forensic investigation, moving to a cloud-based evidence processing model would be extremely beneficial and preliminary attempts have been made in its implementation. Moving towards a Digital Forensics as a Service model would not only expedite the investigative process, but can also result in significant cost savings - freeing up digital forensic experts and law enforcement personnel to progress their caseload. This paper aims to evaluate the applicability of existing digital forensic process models and analyse how each of these might apply to a cloud-based evidence processing paradigm.
CRAug 5, 2017
Privileged Data within Digital EvidenceDominique Fleurbaaij, Mark Scanlon, Nhien-An Le-Khac
In recent years the use of digital communication has increased. This also increased the chance to find privileged data in the digital evidence. Privileged data is protected by law from viewing by anyone other than the client. It is up to the digital investigator to handle this privileged data properly without being able to view the contents. Procedures on handling this information are available, but do not provide any practical information nor is it known how effective filtering is. The objective of this paper is to describe the handling of privileged data in the current digital forensic tools and the creation of a script within the digital forensic tool Nuix. The script automates the handling of privileged data to minimize the exposure of the contents to the digital investigator. The script also utilizes technology within Nuix that extends the automated search of identical privileged document to relate files based on their contents. A comparison of the 'traditional' ways of filtering within the digital forensic tools and the script written in Nuix showed that digital forensic tools are still limited when used on privileged data. The script manages to increase the effectiveness as direct result of the use of relations based on file content.
CRApr 28, 2017
EviPlant: An efficient digital forensic challenge creation, manipulation and distribution solutionMark Scanlon, Xiaoyu Du, David Lillis
Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on the part of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant time and resources and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system as an alternative to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely base operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a "diffing" of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an "evidence package". Evidence packages can be created for different personae, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated into the base images. A number of additional applications in digital forensic challenge creation for tool testing and validation, proficiency testing, and malware analysis are also discussed as a result of using EviPlant.
CYOct 18, 2016
Towards the Leveraging of Data Deduplication to Break the Disk Acquisition Speed LimitHannah Wolahan, Claudio Chico Lorenzo, Elias Bou-Harb et al.
Digital forensic evidence acquisition speed is traditionally limited by two main factors: the read speed of the storage device being investigated, i.e., the read speed of the disk, memory, remote storage, mobile device, etc.), and the write speed of the system used for storing the acquired data. Digital forensic investigators can somewhat mitigate the latter issue through the use of high-speed storage options, such as networked RAID storage, in the controlled environment of the forensic laboratory. However, traditionally, little can be done to improve the acquisition speed past its physical read speed from the target device itself. The protracted time taken for data acquisition wastes digital forensic experts' time, contributes to digital forensic investigation backlogs worldwide, and delays pertinent information from potentially influencing the direction of an investigation. In a remote acquisition scenario, a third contributing factor can also become a detriment to the overall acquisition time - typically the Internet upload speed of the acquisition system. This paper explores an alternative to the traditional evidence acquisition model through the leveraging of a forensic data deduplication system. The advantages that a deduplicated approach can provide over the current digital forensic evidence acquisition process are outlined and some preliminary results of a prototype implementation are discussed.
CYOct 2, 2016
Battling the Digital Forensic Backlog through Data DeduplicationMark Scanlon
In everyday life. Technological advancement can be found in many facets of life, including personal computers, mobile devices, wearables, cloud services, video gaming, web-powered messaging, social media, Internet-connected devices, etc. This technological influence has resulted in these technologies being employed by criminals to conduct a range of crimes -- both online and offline. Both the number of cases requiring digital forensic analysis and the sheer volume of information to be processed in each case has increased rapidly in recent years. As a result, the requirement for digital forensic investigation has ballooned, and law enforcement agencies throughout the world are scrambling to address this demand. While more and more members of law enforcement are being trained to perform the required investigations, the supply is not keeping up with the demand. Current digital forensic techniques are arduously time-consuming and require a significant amount of man power to execute. This paper discusses a novel solution to combat the digital forensic backlog. This solution leverages a deduplication-based paradigm to eliminate the reacquisition, redundant storage, and reanalysis of previously processed data.
CRApr 13, 2016
Current Challenges and Future Research Areas for Digital Forensic InvestigationDavid Lillis, Brett Becker, Tadhg O'Sullivan et al.
Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout the world. It can be anticipated that the number of cases requiring digital forensic analysis will greatly increase in the future. It is also likely that each case will require the analysis of an increasing number of devices including computers, smartphones, tablets, cloud-based services, Internet of Things devices, wearables, etc. The variety of new digital evidence sources pose new and challenging problems for the digital investigator from an identification, acquisition, storage and analysis perspective. This paper explores the current challenges contributing to the backlog in digital forensics from a technical standpoint and outlines a number of future research topics that could greatly contribute to a more efficient digital forensic process.
CRApr 13, 2016
Tiered Forensic Methodology Model for Digital Field Triage by Non-Digital Evidence SpecialistsBen Hitchcock, Nhien-An Le-Khac, Mark Scanlon
Due to budgetary constraints and the high level of training required, digital forensic analysts are in short supply in police forces the world over. This inevitably leads to a prolonged time taken between an investigator sending the digital evidence for analysis and receiving the analytical report back. In an attempt to expedite this procedure, various process models have been created to place the forensic analyst in the field conducting a triage of the digital evidence. By conducting triage in the field, an investigator is able to act upon pertinent information quicker, while waiting on the full report. The work presented as part of this paper focuses on the training of front-line personnel in the field triage process, without the need of a forensic analyst attending the scene. The premise has been successfully implemented within regular/non-digital forensics, i.e., crime scene investigation. In that field, front-line members have been trained in specific tasks to supplement the trained specialists. The concept of front-line members conducting triage of digital evidence in the field is achieved through the development of a new process model providing guidance to these members. To prove the model's viability, an implementation of this new process model is presented and evaluated. The results outlined demonstrate how a tiered response involving digital evidence specialists and non-specialists can better deal with the increasing number of investigations involving digital evidence.
DCOct 2, 2015
Towards the Forensic Identification and Investigation of Cloud Hosted Servers through Noninvasive WiretapsHessel Schut, Mark Scanlon, Jason Farina et al.
When conducting modern cybercrime investigations, evidence has often to be gathered from computer systems located at cloud-based data centres of hosting providers. In cases where the investigation cannot rely on the cooperation of the hosting provider, or where documentation is not available, investigators can often find the identification of which distinct server among many is of interest difficult and extremely time consuming. To address the problem of identifying these servers, in this paper a new approach to rapidly and reliably identify these cloud hosting computer systems is presented. In the outlined approach, a handheld device composed of an embedded computer combined with a method of undetectable interception of Ethernet based communications is presented. This device is tested and evaluated, and a discussion is provided on its usefulness in identifying of server of interest to an investigation.
CROct 2, 2015
HTML5 Zero Configuration Covert Channels: Security Risks and ChallengesJason Farina, Mark Scanlon, Stephen Kohlmann et al.
In recent months there has been an increase in the popularity and public awareness of secure, cloudless file transfer systems. The aim of these services is to facilitate the secure transfer of files in a peer-to- peer (P2P) fashion over the Internet without the need for centralised authentication or storage. These services can take the form of client installed applications or entirely web browser based interfaces. Due to their P2P nature, there is generally no limit to the file sizes involved or to the volume of data transmitted - and where these limitations do exist they will be purely reliant on the capacities of the systems at either end of the transfer. By default, many of these services provide seamless, end-to-end encryption to their users. The cybersecurity and cyberforensic consequences of the potential criminal use of such services are significant. The ability to easily transfer encrypted data over the Internet opens up a range of opportunities for illegal use to cybercriminals requiring minimal technical know-how. This paper explores a number of these services and provides an analysis of the risks they pose to corporate and governmental security. A number of methods for the forensic investigation of such transfers are discussed.
CROct 2, 2015
Project Maelstrom: Forensic Analysis of the BitTorrent-Powered BrowserJason Farina, M-Tahar Kechadi, Mark Scanlon
In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser, Project Maelstrom, into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP-based, client-server model. This decentralised web is powered by each of the visitors accessing each Maelstrom hosted website. Each user shares their copy of the website's source code and multimedia content with new visitors. As a result, a Maelstrom hosted website cannot be taken offline by law enforcement or any other parties. Due to this open distribution model, a number of interesting censorship, security and privacy considerations are raised. This paper explores the application, its protocol, sharing Maelstrom content and its new visitor powered "web-hosting" paradigm.
CRJun 3, 2015
Network investigation methodology for BitTorrent Sync: A Peer-to-Peer based file synchronisation serviceMark Scanlon, Jason Farina, M-Tahar Kechadi
High availability is no longer just a business continuity concern. Users are increasingly dependant on devices that consume and produce data in ever increasing volumes. A popular solution is to have a central repository which each device accesses after centrally managed authentication. This model of use is facilitated by cloud based file synchronisation services such as Dropbox, OneDrive, Google Drive and Apple iCloud. Cloud architecture allows the provisioning of storage space with "always-on" access. Recent concerns over unauthorised access to third party systems and large scale exposure of private data have made an alternative solution desirable. These events have caused users to assess their own security practices and the level of trust placed in third party storage services. One option is BitTorrent Sync, a cloudless synchronisation utility provides data availability and redundancy. This utility replicates files stored in shares to remote peers with access controlled by keys and permissions. While lacking the economies brought about by scale, complete control over data access has made this a popular solution. The ability to replicate data without oversight introduces risk of abuse by users as well as difficulties for forensic investigators. This paper suggests a methodology for investigation and analysis of the protocol to assist in the control of data flow across security perimeters.
NISep 30, 2014
Digital Evidence Bag Selection for P2P Network InvestigationMark Scanlon, M-Tahar Kechadi
The collection and handling of court admissible evidence is a fundamental component of any digital forensic investigation. While the procedures for handling digital evidence take much of their influence from the established policies for the collection of physical evidence, due to the obvious differences in dealing with non-physical evidence, a number of extra policies and procedures are required. This paper compares and contrasts some of the existing digital evidence formats or "bags" and analyses them for their compatibility with evidence gathered from a network source. A new digital extended evidence bag is proposed to specifically deal with evidence gathered from P2P networks, incorporating the network byte stream and on-the-fly metadata generation to aid in expedited identification and analysis.
CRSep 30, 2014
The Case for a Collaborative Universal Peer-to-Peer Botnet Investigation FrameworkMark Scanlon, M-Tahar Kechadi
Peer-to-Peer (P2P) botnets are becoming widely used as a low-overhead, efficient, self-maintaining, distributed alternative to the traditional client/server model across a broad range of cyberattacks. These cyberattacks can take the form of distributed denial of service attacks, authentication cracking, spamming, cyberwarfare or malware distribution targeting on financial systems. These attacks can also cross over into the physical world attacking critical infrastructure causing its disruption or destruction (power, communications, water, etc.). P2P technology lends itself well to being exploited for such malicious purposes due to the minimal setup, running and maintenance costs involved in executing a globally orchestrated attack, alongside the perceived additional layer of anonymity. In the ever-evolving space of botnet technology, reducing the time lag between discovering a newly developed or updated botnet system and gaining the ability to mitigate against it is paramount. Often, numerous investigative bodies duplicate their efforts in creating bespoke tools to combat particular threats. This paper outlines a framework capable of fast tracking the investigative process through collaboration between key stakeholders.
CRSep 30, 2014
BitTorrent Sync: Network Investigation MethodologyMark Scanlon, Jason Farina, M-Tahar Kechadi
The volume of personal information and data most Internet users find themselves amassing is ever increasing and the fast pace of the modern world results in most requiring instant access to their files. Millions of these users turn to cloud based file synchronisation services, such as Dropbox, Microsoft Skydrive, Apple iCloud and Google Drive, to enable "always-on" access to their most up-to-date data from any computer or mobile device with an Internet connection. The prevalence of recent articles covering various invasion of privacy issues and data protection breaches in the media has caused many to review their online security practices with their personal information. To provide an alternative to cloud based file backup and synchronisation, BitTorrent Inc. released an alternative cloudless file backup and synchronisation service, named BitTorrent Sync to alpha testers in April 2013. BitTorrent Sync's popularity rose dramatically throughout 2013, reaching over two million active users by the end of the year. This paper outlines a number of scenarios where the network investigation of the service may prove invaluable as part of a digital forensic investigation. An investigation methodology is proposed outlining the required steps involved in retrieving digital evidence from the network and the results from a proof of concept investigation are presented.
CRSep 30, 2014
Leveraging Decentralization to Extend the Digital Evidence Acquisition Window: Case Study on BitTorrent SyncMark Scanlon, Jason Farina, Nhien An Le Khac et al.
File synchronization services such as Dropbox, Google Drive, Microsoft OneDrive, Apple iCloud, etc., are becoming increasingly popular in today's always-connected world. A popular alternative to the aforementioned services is BitTorrent Sync. This is a decentralized/cloudless file synchronization service and is gaining significant popularity among Internet users with privacy concerns over where their data is stored and who has the ability to access it. The focus of this paper is the remote recovery of digital evidence pertaining to files identified as being accessed or stored on a suspect's computer or mobile device. A methodology for the identification, investigation, recovery and verification of such remote digital evidence is outlined. Finally, a proof-of-concept remote evidence recovery from BitTorrent Sync shared folder highlighting a number of potential scenarios for the recovery and verification of such evidence.
CRSep 29, 2014
BitTorrent Sync: First Impressions and Digital Forensic ImplicationsJason Farina, Mark Scanlon, M-Tahar Kechadi
With professional and home Internet users becoming increasingly concerned with data protection and privacy, the privacy afforded by popular cloud file synchronisation services, such as Dropbox, OneDrive and Google Drive, is coming under scrutiny in the press. A number of these services have recently been reported as sharing information with governmental security agencies without warrants. BitTorrent Sync is seen as an alternative by many and has gathered over two million users by December 2013 (doubling since the previous month). The service is completely decentralised, offers much of the same synchronisation functionality of cloud powered services and utilises encryption for data transmission (and optionally for remote storage). The importance of understanding BitTorrent Sync and its resulting digital investigative implications for law enforcement and forensic investigators will be paramount to future investigations. This paper outlines the client application, its detected network traffic and identifies artefacts that may be of value as evidence for future digital investigations.