M-Tahar Kechadi

CR
h-index6
28papers
520citations
Novelty26%
AI Score40

28 Papers

29.2CLApr 6Code
LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection

Cheng Xu, Changhong Jin, Yingjie Niu et al.

The rapid development of Large Language Models (LLMs) has transformed fake news detection and fact-checking tasks from simple classification to complex reasoning. However, evaluation frameworks have not kept pace. Current benchmarks are static, making them vulnerable to benchmark data contamination (BDC) and ineffective at assessing reasoning under temporal uncertainty. To address this, we introduce LiveFact a continuously updated benchmark that simulates the real-world "fog of war" in misinformation detection. LiveFact uses dynamic, temporal evidence sets to evaluate models on their ability to reason with evolving, incomplete information rather than on memorized knowledge. We propose a dual-mode evaluation: Classification Mode for final verification and Inference Mode for evidence-based reasoning, along with a component to monitor BDC explicitly. Tests with 22 LLMs show that open-source Mixture-of-Experts models, such as Qwen3-235B-A22B, now match or outperform proprietary state-of-the-art systems. More importantly, our analysis finds a significant "reasoning gap." Capable models exhibit epistemic humility by recognizing unverifiable claims in early data slices-an aspect traditional static benchmarks overlook. LiveFact sets a sustainable standard for evaluating robust, temporally aware AI verification.

CLJul 15, 2025
DCR: Quantifying Data Contamination in LLMs Evaluation

Cheng Xu, Nan Yan, Shuhao Guan et al.

The rapid advancement of large language models (LLMs) has heightened concerns about benchmark data contamination (BDC), where models inadvertently memorize evaluation data during the training process, inflating performance metrics, and undermining genuine generalization assessment. This paper introduces the Data Contamination Risk (DCR) framework, a lightweight, interpretable pipeline designed to detect and quantify BDC risk across four granular levels: semantic, informational, data, and label. By synthesizing contamination scores via a fuzzy inference system, DCR produces a unified DCR Factor that adjusts raw accuracy to reflect contamination-aware performance. Validated on 9 LLMs (0.5B-72B) across sentiment analysis, fake news detection, and arithmetic reasoning tasks, the DCR framework reliably diagnoses contamination severity and with accuracy adjusted using the DCR Factor to within 4% average error across the three benchmarks compared to the uncontaminated baseline. Emphasizing computational efficiency and transparency, DCR provides a practical tool for integrating contamination assessment into routine evaluations, fostering fairer comparisons and enhancing the credibility of LLM benchmarking practices.

CLJun 6, 2024
Benchmark Data Contamination of Large Language Models: A Survey

Cheng Xu, Shuhao Guan, Derek Greene et al.

The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable performance during the evaluation phase of the process. This paper reviews the complex challenge of BDC in LLM evaluation and explores alternative assessment methods to mitigate the risks associated with traditional benchmarks. The paper also examines challenges and future directions in mitigating BDC risks, highlighting the complexity of the issue and the need for innovative solutions to ensure the reliability of LLM evaluation in real-world applications.

IRMar 21, 2021
Structural Textile Pattern Recognition and Processing Based on Hypergraphs

Vuong M. Ngo, Sven Helmer, Nhien-An Le-Khac et al.

The humanities, like many other areas of society, are currently undergoing major changes in the wake of digital transformation. However, in order to make collection of digitised material in this area easily accessible, we often still lack adequate search functionality. For instance, digital archives for textiles offer keyword search, which is fairly well understood, and arrange their content following a certain taxonomy, but search functionality at the level of thread structure is still missing. To facilitate the clustering and search, we introduce an approach for recognising similar weaving patterns based on their structures for textile archives. We first represent textile structures using hypergraphs and extract multisets of k-neighbourhoods describing weaving patterns from these graphs. Then, the resulting multisets are clustered using various distance measures and various clustering algorithms (K-Means for simplicity and hierarchical agglomerative algorithms for precision). We evaluate the different variants of our approach experimentally, showing that this can be implemented efficiently (meaning it has linear complexity), and demonstrate its quality to query and cluster datasets containing large textile samples. As, to the est of our knowledge, this is the first practical approach for explicitly modelling complex and irregular weaving patterns usable for retrieval, we aim at establishing a solid baseline.

DBMar 11, 2020
Crop Knowledge Discovery Based on Agricultural Big Data Integration

Vuong M. Ngo, M-Tahar Kechadi

Nowadays, the agricultural data can be generated through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, agricultural laboratories, farmers, government agencies and agribusinesses. The analysis of this big data enables farmers, companies and agronomists to extract high business and scientific knowledge, improving their operational processes and product quality. However, before analysing this data, different data sources need to be normalised, homogenised and integrated into a unified data representation. In this paper, we propose an agricultural data integration method using a constellation schema which is designed to be flexible enough to incorporate other datasets and big data models. We also apply some methods to extract knowledge with the view to improve crop yield; these include finding suitable quantities of soil properties, herbicides and insecticides for both increasing crop yield and protecting the environment.

AIOct 23, 2019
Knowledge Map: Toward a New Approach Supporting the Knowledge Management in Distributed Data Mining

Nhien-An Le-Khac, Lamine M. Aouad, M-Tahar Kechadi

Distributed data mining (DDM) deals with the problem of finding patterns or models, called knowledge, in an environment with distributed data and computations. Today, a massive amounts of data which are often geographically distributed and owned by different organisation are being mined. As consequence, a large mount of knowledge are being produced. This causes problems of not only knowledge management but also visualization in data mining. Besides, the main aim of DDM is to exploit fully the benefit of distributed data analysis while minimising the communication. Existing DDM techniques perform partial analysis of local data at individual sites and then generate a global model by aggregating these local results. These two steps are not independent since naive approaches to local analysis may produce an incorrect and ambiguous global data model. The integrating and cooperating of these two steps need an effective knowledge management, concretely an efficient map of knowledge in order to take the advantage of mined knowledge to guide mining the data. In this paper, we present "knowledge map", a representation of knowledge about mined knowledge. This new approach aims to manage efficiently mined knowledge in large scale distributed platform such as Grid. This knowledge map is used to facilitate not only the visualization, evaluation of mining results but also the coordinating of local mining process and existing knowledge to increase the accuracy of final model.

AIOct 21, 2019
Recurrent neural network approach for cyclic job shop scheduling problem

M-Tahar Kechadi, Kok Seng Low, G. Goncalves

While cyclic scheduling is involved in numerous real-world applications, solving the derived problem is still of exponential complexity. This paper focuses specifically on modelling the manufacturing application as a cyclic job shop problem and we have developed an efficient neural network approach to minimise the cycle time of a schedule. Our approach introduces an interesting model for a manufacturing production, and it is also very efficient, adaptive and flexible enough to work with other techniques. Experimental results validated the approach and confirmed our hypotheses about the system model and the efficiency of neural networks for such a class of problems.

CRAug 27, 2019
A Security-Aware Access Model for Data-Driven EHR System

Ngoc Hong Tran, Thien-An Nguyen-Ngoc, Nhien-An Le-Khac et al.

Digital healthcare systems are very popular lately, as they provide a variety of helpful means to monitor people's health state as well as to protect people against an unexpected health situation. These systems contain a huge amount of personal information in a form of electronic health records that are not allowed to be disclosed to unauthorized users. Hence, health data and information need to be protected against attacks and thefts. In this paper, we propose a secure distributed architecture for healthcare data storage and analysis. It uses a novel security model to rigorously control permissions of accessing sensitive data in the system, as well as to protect the transmitted data between distributed system servers and nodes. The model also satisfies the NIST security requirements. Thorough experimental results show that the model is very promising.

DBMay 29, 2019
Designing and Implementing Data Warehouse for Agricultural Big Data

Vuong M. Ngo, Nhien-An Le-Khac, M-Tahar Kechadi

In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.

DBFeb 1, 2018
Hierarchical Aggregation Approach for Distributed clustering of spatial datasets

Malika Bendechache, Nhien-An Le-Khac, M-Tahar Kechadi

In this paper, we present a new approach of distributed clustering for spatial datasets, based on an innovative and efficient aggregation technique. This distributed approach consists of two phases: 1) local clustering phase, where each node performs a clustering on its local data, 2) aggregation phase, where the local clusters are aggregated to produce global clusters. This approach is characterised by the fact that the local clusters are represented in a simple and efficient way. And The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in both response time and memory allocation. We evaluated the approach with different datasets and compared it to well-known clustering techniques. The experimental results show that our approach is very promising and outperforms all those algorithms

CRAug 29, 2017
Increasing digital investigator availability through efficient workflow management and automation

Ronald In de Braekt, Nhien-An Le-Khac, Jason Farina et al.

The growth of digital storage capacities and diversity devices has had a significant time impact on digital forensic laboratories in law enforcement. Backlogs have become commonplace and increasingly more time is spent in the acquisition and preparation steps of an investigation as opposed to detailed evidence analysis and reporting. There is generally little room for increasing digital investigation capacity in law enforcement digital forensic units and the allocated budgets for these units are often decreasing. In the context of developing an efficient investigation process, one of the key challenges amounts to how to achieve more with less. This paper proposes a workflow management automation framework for handling common digital forensic tools. The objective is to streamline the digital investigation workflow - enabling more efficient use of limited hardware and software. The proposed automation framework reduces the time digital forensic experts waste conducting time-consuming, though necessary, tasks. The evidence processing time is decreased through server-side automation resulting in 24/7 evidence preparation. The proposed framework increases efficiency of use of forensic software and hardware, reduces the infrastructure costs and license fees, and simplifies the preparation steps for the digital investigator. The proposed approach is evaluated in a real-world scenario to evaluate its robustness and highlight its benefits.

CRAug 29, 2017
Investigation and Automating Extraction of Thumbnails Produced by Image viewers

Wybren van der Meer, Kim-Kwang Raymond Choo, Nhien-An Le-Khac et al.

Today, in digital forensics, images normally provide important information within an investigation. However, not all images may still be available within a forensic digital investigation as they were all deleted for example. Data carving can be used in this case to retrieve deleted images but the carving time is normally significant and these images can be moreover overwritten by other data. One of the solutions is to look at thumbnails of images that are no longer available. These thumbnails can often be found within databases created by either operating systems or image viewers. In literature, most research and practical focus on the extraction of thumbnails from databases created by the operating system. There is a little research working on the thumbnails created by the image reviewers as these thumbnails are application-driven in terms of pre-defined sizes, adjustments and storage location. Eventually, thumbnail databases from image viewers are significant forensic artefacts for investigators as these programs deal with large amounts of images. However, investigating these databases so far is still manual or semi-automatic task that leads to the huge amount of forensic time. Therefore, in this paper we propose a new approach of automating extraction of thumbnails produced by image viewers. We also test our approach with popular image viewers in different storage structures and locations to show its robustness.

DCApr 11, 2017
Feature Selection Parallel Technique for Remotely Sensed Imagery Classification

Nhien-An Le-Khac, M-Tahar Kechadi, Bo Wu et al.

Remote sensing research focusing on feature selection has long attracted the attention of the remote sensing community because feature selection is a prerequisite for image processing and various applications. Different feature selection methods have been proposed to improve the classification accuracy. They vary from basic search techniques to clonal selections, and various optimal criteria have been investigated. Recently, methods using dependence-based measures have attracted much attention due to their ability to deal with very high dimensional datasets. However, these methods are based on Cramers V test, which has performance issues with large datasets. In this paper, we propose a parallel approach to improve their performance. We evaluate our approach on hyper-spectral and high spatial resolution images and compare it to the proposed methods with a centralized version as preliminary results. The results are very promising.

CRApr 11, 2017
Forensic Analysis of TomTom Navigation Application

Nhien-An Le-Khac, Mark Roeloffs, M-Tahar Kechadi

In the forensic field of digital technology, there has been a great deal of investigation into the decoding of navigation systems of the brand TomTom. As TomTom is the market leader in navigation systems, a large number of these devices are investigated. These devices can hold an abundance of significant location information. Currently, it is possible with the use of multiple methods to make physical copies of mobile devices running Android. The next great forensic problem is all the various programs that can be installed on these devices. There is now an application available from the company TomTom in the Google Play Store. This application mimics a navigation system on your Android mobile device. Indeed, the TomTom application on Android can hold a great deal of information. In this paper, we present a process of forensic acquisition and analysis of the TomTom Android application. We focus on the following questions: Is there a possibility to find previously driven routes or GPS coordinates with timestamps in the memory of the mobile device? To investigate what is stored in these files, driving tests were performed. During these driving tests a copy was made of the most important file using a self-written program. The significant files were found and the data in these files was decoded. We show and analyse our results with Samsung mobile devices. We compare also these results with forensic acquisition from TomTom GPS devices.

DBApr 11, 2017
Efficient Large Scale Clustering based on Data Partitioning

Malika Bendechache, Nhien-An Le-Khac, M-Tahar Kechadi

Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases, generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms, K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic, no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.

CRDec 1, 2016
Forensics Acquisition and Analysis of instant messaging and VoIP applications

Christos Sgaras, M-Tahar Kechadi, Nhien-An Le-Khac

The advent of the Internet has significantly transformed the daily activities of millions of people, with one of them being the way people communicate where Instant Messaging (IM) and Voice over IP (VoIP) communications have become prevalent. Although IM applications are ubiquitous communication tools nowadays, it was observed that the relevant research on the topic of evidence collection from IM services was limited. The reason is an IM can serve as a very useful yet very dangerous platform for the victim and the suspect to communicate. Indeed, the increased use of Instant Messengers on smart phones has turned to be the goldmine for mobile and computer forensic experts. Traces and Evidence left by applications can be held on smart phones and retrieving those potential evidences with right forensic technique is strongly required. Recently, most research on IM forensics focus on applications such as WhatsApp, Viber and Skype. However, in the literature, there are very few forensic analysis and comparison related to IM applications such as WhatsApp, Viber and Skype and Tango on both iOS and Android platforms, even though the total users of this application already exceeded 1 billion. Therefore, in this paper we present forensic acquisition and analysis of these four IMs and VoIPs for both iOS and Android platforms. We try to answer on how evidence can be collected when IM communications are used. We also define taxonomy of target artefacts in order to guide and structure the subsequent forensic analysis. Finally, a review of the information that can become available via the IM vendor was conducted. The achieved results of this research provided elaborative answers on the types of artifacts that can be identified by these IM and VoIP applications. We compare moreover the forensics analysis of these popular applications: WhatApp, Skype, Viber and Tango.

CRNov 29, 2016
The State of the Art Forensic Techniques in Mobile Cloud Environment: A Survey, Challenges and Current Trends

Muhammad Faheem, M-Tahar Kechadi, Nhien-An Le-Khac

Smartphones have become popular in recent days due to the accessibility of a wide range of applications. These sophisticated applications demand more computing resources in a resource constraint smartphone. Cloud computing is the motivating factor for the progress of these applications. The emerging mobile cloud computing introduces a new architecture to offload smartphone and utilize cloud computing technology to solve resource requirements. The popularity of mobile cloud computing is an opportunity for misuse and unlawful activities. Therefore, it is a challenging platform for digital forensic investigations due to the non-availability of methodologies, tools and techniques. The aim of this work is to analyze the forensic tools and methodologies for crime investigation in a mobile cloud platform as it poses challenges in proving the evidence. The advancement of forensic tools and methodologies are much slower than the current technology development in mobile cloud computing. Thus, forces the available tools, and techniques become increasingly obsolete. Therefore, it opens up the door for the new forensic tools and techniques to cope up with recent developments. Hence, this work presents a detailed survey of forensic methodology and corresponding issues in a mobile device, cloud environment, and mobile cloud applications. It mainly focuses on digital forensic issues related to mobile cloud applications and also analyze the scope, challenges and opportunities. Finally, this work reviewed the forensic procedures of two cloud storage services used for mobile cloud applications such as Dropbox and SkyDrive.

CRNov 29, 2016
Toward a new mobile cloud forensic framework

Muhammad Faheem, M-Tahar Kechadi, Nhien-An Le-Khac

Smartphones have created a significant impact on the day to day activities of every individual. Now a days a wide range of Smartphone applications are available and it necessitates high computing resources in order to build these applications. Cloud computing offers enormous resources and extends services to resource-constrained mobile devices. Mobile Cloud Computing is emerging as a key technology to utilize virtually unlimited resources over the Internet using Smartphones. Offloading data and computations to improve productivity, enhance performance, save energy, and improve user experience. Social network applications largely utilize Mobile Cloud Computing to reap the benefits. The social network has witnessed unprecedented growth in the recent years, and millions of registered users access it using Smartphones. The mobile cloud social network applications introduce not only convenience but also various issues related to criminal and illegal activities. Despite being primarily used to communicate and socialize with contacts, the multifarious and anonymous nature of social networking websites increases susceptibility to cybercrimes. Taking into account, the advantage of mobile cloud computing and popularity of social network applications, it is essential to establish a forensic framework based on mobile cloud platform that solves the problems of today forensic requirements. In this paper we present a mobile cloud forensic framework that allows the forensic investigator to collect the automated synchronized copies of data on both mobile and cloud servers to prove the evidence of cloud usage. We also show our preliminary results of this study.

CRSep 9, 2016
Android and Wireless data-extraction using Wi-Fi

Bert Busstra, Nhien-An Le-Khac, M-Tahar Kechadi

Today, mobile phones are very popular, fast growing technology. Mobile phones of the present day are more and more like small computers. The so-called "smartphones" contain a wealth of information each. This information has been proven to be very useful in crime investigations, because relevant evidence can be found in data retrieved from mobile phones used by criminals. In traditional methods, the data from mobile phones can be extracted using an USB-cable. However, for some reason this USB-cable connection cannot be made, the data should be extracted in an alternative way. In this paper, we study the possibility of extracting data from mobile devices using a Wi-Fi connection. We describe our approach on mobile devices containing the Android Operating System. Through our experiments, we also give recommendation on which application and protocol can be used best to retrieve data.

CRSep 9, 2016
Forensics Acquisition of IMVU: A Case Study

Robert van Voorst, M-Tahar Kechadi, Nhien-An Le-Khac

There are many applications available for personal computers and mobile devices that facilitate users in meeting potential partners. There is, however, a risk associated with the level of anonymity on using instant message applications, because there exists the potential for predators to attract and lure vulnerable users. Today Instant Messaging within a Virtual Universe (IMVU) combines custom avatars, chat or instant message (IM), community, content creation, commerce, and anonymity. IMVU is also being exploited by criminals to commit a wide variety of offenses. However, there are very few researches on digital forensic acquisition of IMVU applications. In this paper, we discuss first of all on challenges of IMVU forensics. We present a forensic acquisition of an IMVU 3D application as a case study. We also describe and analyse our experiments with this application.

CROct 2, 2015
HTML5 Zero Configuration Covert Channels: Security Risks and Challenges

Jason Farina, Mark Scanlon, Stephen Kohlmann et al.

In recent months there has been an increase in the popularity and public awareness of secure, cloudless file transfer systems. The aim of these services is to facilitate the secure transfer of files in a peer-to- peer (P2P) fashion over the Internet without the need for centralised authentication or storage. These services can take the form of client installed applications or entirely web browser based interfaces. Due to their P2P nature, there is generally no limit to the file sizes involved or to the volume of data transmitted - and where these limitations do exist they will be purely reliant on the capacities of the systems at either end of the transfer. By default, many of these services provide seamless, end-to-end encryption to their users. The cybersecurity and cyberforensic consequences of the potential criminal use of such services are significant. The ability to easily transfer encrypted data over the Internet opens up a range of opportunities for illegal use to cybercriminals requiring minimal technical know-how. This paper explores a number of these services and provides an analysis of the risks they pose to corporate and governmental security. A number of methods for the forensic investigation of such transfers are discussed.

CROct 2, 2015
Project Maelstrom: Forensic Analysis of the BitTorrent-Powered Browser

Jason Farina, M-Tahar Kechadi, Mark Scanlon

In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser, Project Maelstrom, into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP-based, client-server model. This decentralised web is powered by each of the visitors accessing each Maelstrom hosted website. Each user shares their copy of the website's source code and multimedia content with new visitors. As a result, a Maelstrom hosted website cannot be taken offline by law enforcement or any other parties. Due to this open distribution model, a number of interesting censorship, security and privacy considerations are raised. This paper explores the application, its protocol, sharing Maelstrom content and its new visitor powered "web-hosting" paradigm.

CRJun 3, 2015
Network investigation methodology for BitTorrent Sync: A Peer-to-Peer based file synchronisation service

Mark Scanlon, Jason Farina, M-Tahar Kechadi

High availability is no longer just a business continuity concern. Users are increasingly dependant on devices that consume and produce data in ever increasing volumes. A popular solution is to have a central repository which each device accesses after centrally managed authentication. This model of use is facilitated by cloud based file synchronisation services such as Dropbox, OneDrive, Google Drive and Apple iCloud. Cloud architecture allows the provisioning of storage space with "always-on" access. Recent concerns over unauthorised access to third party systems and large scale exposure of private data have made an alternative solution desirable. These events have caused users to assess their own security practices and the level of trust placed in third party storage services. One option is BitTorrent Sync, a cloudless synchronisation utility provides data availability and redundancy. This utility replicates files stored in shares to remote peers with access controlled by keys and permissions. While lacking the economies brought about by scale, complete control over data access has made this a popular solution. The ability to replicate data without oversight introduces risk of abuse by users as well as difficulties for forensic investigators. This paper suggests a methodology for investigation and analysis of the protocol to assist in the control of data flow across security perimeters.

NISep 30, 2014
Digital Evidence Bag Selection for P2P Network Investigation

Mark Scanlon, M-Tahar Kechadi

The collection and handling of court admissible evidence is a fundamental component of any digital forensic investigation. While the procedures for handling digital evidence take much of their influence from the established policies for the collection of physical evidence, due to the obvious differences in dealing with non-physical evidence, a number of extra policies and procedures are required. This paper compares and contrasts some of the existing digital evidence formats or "bags" and analyses them for their compatibility with evidence gathered from a network source. A new digital extended evidence bag is proposed to specifically deal with evidence gathered from P2P networks, incorporating the network byte stream and on-the-fly metadata generation to aid in expedited identification and analysis.

CRSep 30, 2014
The Case for a Collaborative Universal Peer-to-Peer Botnet Investigation Framework

Mark Scanlon, M-Tahar Kechadi

Peer-to-Peer (P2P) botnets are becoming widely used as a low-overhead, efficient, self-maintaining, distributed alternative to the traditional client/server model across a broad range of cyberattacks. These cyberattacks can take the form of distributed denial of service attacks, authentication cracking, spamming, cyberwarfare or malware distribution targeting on financial systems. These attacks can also cross over into the physical world attacking critical infrastructure causing its disruption or destruction (power, communications, water, etc.). P2P technology lends itself well to being exploited for such malicious purposes due to the minimal setup, running and maintenance costs involved in executing a globally orchestrated attack, alongside the perceived additional layer of anonymity. In the ever-evolving space of botnet technology, reducing the time lag between discovering a newly developed or updated botnet system and gaining the ability to mitigate against it is paramount. Often, numerous investigative bodies duplicate their efforts in creating bespoke tools to combat particular threats. This paper outlines a framework capable of fast tracking the investigative process through collaboration between key stakeholders.

CRSep 30, 2014
BitTorrent Sync: Network Investigation Methodology

Mark Scanlon, Jason Farina, M-Tahar Kechadi

The volume of personal information and data most Internet users find themselves amassing is ever increasing and the fast pace of the modern world results in most requiring instant access to their files. Millions of these users turn to cloud based file synchronisation services, such as Dropbox, Microsoft Skydrive, Apple iCloud and Google Drive, to enable "always-on" access to their most up-to-date data from any computer or mobile device with an Internet connection. The prevalence of recent articles covering various invasion of privacy issues and data protection breaches in the media has caused many to review their online security practices with their personal information. To provide an alternative to cloud based file backup and synchronisation, BitTorrent Inc. released an alternative cloudless file backup and synchronisation service, named BitTorrent Sync to alpha testers in April 2013. BitTorrent Sync's popularity rose dramatically throughout 2013, reaching over two million active users by the end of the year. This paper outlines a number of scenarios where the network investigation of the service may prove invaluable as part of a digital forensic investigation. An investigation methodology is proposed outlining the required steps involved in retrieving digital evidence from the network and the results from a proof of concept investigation are presented.

CRSep 29, 2014
BitTorrent Sync: First Impressions and Digital Forensic Implications

Jason Farina, Mark Scanlon, M-Tahar Kechadi

With professional and home Internet users becoming increasingly concerned with data protection and privacy, the privacy afforded by popular cloud file synchronisation services, such as Dropbox, OneDrive and Google Drive, is coming under scrutiny in the press. A number of these services have recently been reported as sharing information with governmental security agencies without warrants. BitTorrent Sync is seen as an alternative by many and has gathered over two million users by December 2013 (doubling since the previous month). The service is completely decentralised, offers much of the same synchronisation functionality of cloud powered services and utilises encryption for data transmission (and optionally for remote storage). The importance of understanding BitTorrent Sync and its resulting digital investigative implications for law enforcement and forensic investigators will be paramount to future investigations. This paper outlines the client application, its detected network traffic and identifies artefacts that may be of value as evidence for future digital investigations.

NENov 30, 2013
A Framework for Genetic Algorithms Based on Hadoop

Filomena Ferrucci, M-Tahar Kechadi, Pasquale Salza et al.

Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in many real-world applications. The sequential execution of GAs requires considerable computational power both in time and resources. Nevertheless, GAs are naturally parallel and accessing a parallel platform such as Cloud is easy and cheap. Apache Hadoop is one of the common services that can be used for parallel applications. However, using Hadoop to develop a parallel version of GAs is not simple without facing its inner workings. Even though some sequential frameworks for GAs already exist, there is no framework supporting the development of GA applications that can be executed in parallel. In this paper is described a framework for parallel GAs on the Hadoop platform, following the paradigm of MapReduce. The main purpose of this framework is to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the Cloud with a good performance. The framework has been also exploited to develop an application for Feature Subset Selection problem. A preliminary analysis of the performance of the developed GA application has been performed using three datasets and shown very promising performance.