CRDec 10, 2025
Comparative Analysis of Hash-based Malware Clustering via K-MeansAink Acrie Soe Thein, Nikolaos Pitropakis, Pavlos Papadopoulos et al.
With the adoption of multiple digital devices in everyday life, the cyber-attack surface has increased. Adversaries are continuously exploring new avenues to exploit them and deploy malware. On the other hand, detection approaches typically employ hashing-based algorithms such as SSDeep, TLSH, and IMPHash to capture structural and behavioural similarities among binaries. This work focuses on the analysis and evaluation of these techniques for clustering malware samples using the K-means algorithm. More specifically, we experimented with established malware families and traits and found that TLSH and IMPHash produce more distinct, semantically meaningful clusters, whereas SSDeep is more efficient for broader classification tasks. The findings of this work can guide the development of more robust threat-detection mechanisms and adaptive security mechanisms.
CVDec 19, 2025
Adversarial Robustness of Vision in Open Foundation ModelsJonathon Fox, William J Buchanan, Pavlos Papadopoulos
With the increase in deep learning, it becomes increasingly difficult to understand the model in which AI systems can identify objects. Thus, an adversary could aim to modify an image by adding unseen elements, which will confuse the AI in its recognition of an entity. This paper thus investigates the adversarial robustness of LLaVA-1.5-13B and Meta's Llama 3.2 Vision-8B-2. These are tested for untargeted PGD (Projected Gradient Descent) against the visual input modality, and empirically evaluated on the Visual Question Answering (VQA) v2 dataset subset. The results of these adversarial attacks are then quantified using the standard VQA accuracy metric. This evaluation is then compared with the accuracy degradation (accuracy drop) of LLaVA and Llama 3.2 Vision. A key finding is that Llama 3.2 Vision, despite a lower baseline accuracy in this setup, exhibited a smaller drop in performance under attack compared to LLaVA, particularly at higher perturbation levels. Overall, the findings confirm that the vision modality represents a viable attack vector for degrading the performance of contemporary open-weight VLMs, including Meta's Llama 3.2 Vision. Furthermore, they highlight that adversarial robustness does not necessarily correlate directly with standard benchmark performance and may be influenced by underlying architectural and training factors.
CRNov 18, 2020Code
Asymmetric Private Set Intersection with Applications to Contact Tracing and Private Vertical Federated Machine LearningNick Angelou, Ayoub Benaissa, Bogdan Cebere et al.
We present a multi-language, cross-platform, open-source library for asymmetric private set intersection (PSI) and PSI-Cardinality (PSI-C). Our protocol combines traditional DDH-based PSI and PSI-C protocols with compression based on Bloom filters that helps reduce communication in the asymmetric setting. Currently, our library supports C++, C, Go, WebAssembly, JavaScript, Python, and Rust, and runs on both traditional hardware (x86) and browser targets. We further apply our library to two use cases: (i) a privacy-preserving contact tracing protocol that is compatible with existing approaches, but improves their privacy guarantees, and (ii) privacy-preserving machine learning on vertically partitioned data.
CRFeb 7, 2022
Ransomware: Analysing the Impact on Windows Active Directory Domain ServicesGrant McDonald, Pavlos Papadopoulos, Nikolaos Pitropakis et al.
Ransomware has become an increasingly popular type of malware across the past decade and continues to rise in popularity due to its high profitability. Organisations and enterprises have become prime targets for ransomware as they are more likely to succumb to ransom demands as part of operating expenses to counter the cost incurred from downtime. Despite the prevalence of ransomware as a threat towards organisations, there is very little information outlining how ransomware affects Windows Server environments, and particularly its proprietary domain services such as Active Directory. Hence, we aim to increase the cyber situational awareness of organisations and corporations that utilise these environments. Dynamic analysis was performed using three ransomware variants to uncover how crypto-ransomware affects Windows Server-specific services and processes. Our work outlines the practical investigation undertaken as WannaCry, TeslaCrypt, and Jigsaw were acquired and tested against several domain services. The findings showed that none of the three variants stopped the processes and decidedly left all domain services untouched. However, although the services remained operational, they became uniquely dysfunctional as ransomware encrypted the files pertaining to those services
CRDec 19, 2021
Privacy-preserving and Trusted Threat Intelligence Sharing using Distributed LedgersHisham Ali, Pavlos Papadopoulos, Jawad Ahmad et al.
Threat information sharing is considered as one of the proactive defensive approaches for enhancing the overall security of trusted partners. Trusted partner organizations can provide access to past and current cybersecurity threats for reducing the risk of a potential cyberattack - the requirements for threat information sharing range from simplistic sharing of documents to threat intelligence sharing. Therefore, the storage and sharing of highly sensitive threat information raises considerable concerns regarding constructing a secure, trusted threat information exchange infrastructure. Establishing a trusted ecosystem for threat sharing will promote the validity, security, anonymity, scalability, latency efficiency, and traceability of the stored information that protects it from unauthorized disclosure. This paper proposes a system that ensures the security principles mentioned above by utilizing a distributed ledger technology that provides secure decentralized operations through smart contracts and provides a privacy-preserving ecosystem for threat information storage and sharing regarding the MITRE ATT\&CK framework.
CROct 5, 2021
Evaluating Tooling and Methodology when Analysing Bitcoin Mixing Services After Forensic SeizureEdward Henry Young, Christos Chrysoulas, Nikolaos Pitropakis et al.
Little or no research has been directed to analysis and researching forensic analysis of the Bitcoin mixing or 'tumbling' service themselves. This work is intended to examine effective tooling and methodology for recovering forensic artifacts from two privacy focused mixing services namely Obscuro which uses the secure enclave on intel chips to provide enhanced confidentiality and Wasabi wallet which uses CoinJoin to mix and obfuscate crypto currencies. These wallets were set up on VMs and then several forensic tools used to examine these VM images for relevant forensic artifacts. These forensic tools were able to recover a broad range of forensic artifacts and found both network forensics and logging files to be a useful source of artifacts to deanonymize these mixing services.
CRSep 17, 2021
GLASS: Towards Secure and Decentralized eGovernance Services using IPFSChristos Chrysoulas, Amanda Thomson, Nikolaos Pitropakis et al.
The continuously advancing digitization has provided answers to the bureaucratic problems faced by eGovernance services. This innovation led them to an era of automation it has broadened the attack surface and made them a popular target for cyber attacks. eGovernance services utilize internet, which is currently a location addressed system where whoever controls the location controls not only the content itself, but the integrity of that content, and the access to that content. We propose GLASS, a decentralised solution which combines the InterPlanetary File System (IPFS) with Distributed Ledger technology and Smart Contracts to secure EGovernance services. We also create a testbed environment where we measure the IPFS performance.
LGApr 26, 2021
Launching Adversarial Attacks against Network Intrusion Detection Systems for IoTPavlos Papadopoulos, Oliver Thornewill von Essen, Nikolaos Pitropakis et al.
As the internet continues to be populated with new devices and emerging technologies, the attack surface grows exponentially. Technology is shifting towards a profit-driven Internet of Things market where security is an afterthought. Traditional defending approaches are no longer sufficient to detect both known and unknown attacks to high accuracy. Machine learning intrusion detection systems have proven their success in identifying unknown attacks with high precision. Nevertheless, machine learning models are also vulnerable to attacks. Adversarial examples can be used to evaluate the robustness of a designed model before it is deployed. Further, using adversarial examples is critical to creating a robust model designed for an adversarial environment. Our work evaluates both traditional machine learning and deep learning models' robustness using the Bot-IoT dataset. Our methodology included two main approaches. First, label poisoning, used to cause incorrect classification by the model. Second, the fast gradient sign method, used to evade detection measures. The experiments demonstrated that an attacker could manipulate or circumvent detection with significant probability.
LGApr 12, 2021
Practical Defences Against Model Inversion Attacks for Split Neural NetworksTom Titcombe, Adam J. Hall, Pavlos Papadopoulos et al.
We describe a threat model under which a split network-based federated learning system is susceptible to a model inversion attack by a malicious computational server. We demonstrate that the attack can be successfully performed with limited knowledge of the data distribution by the attacker. We propose a simple additive noise method to defend against model inversion, finding that the method can significantly reduce attack efficacy at an acceptable accuracy trade-off on MNIST. Furthermore, we show that NoPeekNN, an existing defensive method, protects different information from exposure, suggesting that a combined defence is necessary to fully protect private user data.
LGApr 1, 2021
PyVertical: A Vertical Federated Learning Framework for Multi-headed SplitNNDaniele Romanini, Adam James Hall, Pavlos Papadopoulos et al.
We introduce PyVertical, a framework supporting vertical federated learning using split neural networks. The proposed framework allows a data scientist to train neural networks on data features vertically partitioned across multiple owners while keeping raw data on an owner's device. To link entities shared across different datasets' partitions, we use Private Set Intersection on IDs associated with data points. To demonstrate the validity of the proposed framework, we present the training of a simple dual-headed split neural network for a MNIST classification task, with data samples vertically distributed across two data owners and a data scientist.
CRMar 29, 2021
Privacy and Trust Redefined in Federated Machine LearningPavlos Papadopoulos, Will Abramson, Adam J. Hall et al.
A common privacy issue in traditional machine learning is that data needs to be disclosed for the training procedures. In situations with highly sensitive data such as healthcare records, accessing this information is challenging and often prohibited. Luckily, privacy-preserving technologies have been developed to overcome this hurdle by distributing the computation of the training and ensuring the data privacy to their owners. The distribution of the computation to multiple participating entities introduces new privacy complications and risks. In this paper, we present a privacy-preserving decentralised workflow that facilitates trusted federated learning among participants. Our proof-of-concept defines a trust framework instantiated using decentralised identity technologies being developed under Hyperledger projects Aries/Indy/Ursa. Only entities in possession of Verifiable Credentials issued from the appropriate authorities are able to establish secure, authenticated communication channels authorised to participate in a federated learning workflow related to mental health data.
CRNov 18, 2020
A Privacy-Preserving Healthcare Framework Using Hyperledger FabricCharalampos Stamatellis, Pavlos Papadopoulos, Nikolaos Pitropakis et al.
Electronic health record (EHR) management systems require the adoption of effective technologies when health information is being exchanged. Current management approaches often face risks that may expose medical record storage solutions to common security attack vectors. However, healthcare-oriented blockchain solutions can provide a decentralized, anonymous and secure EHR handling approach. This paper presents PREHEALTH, a privacy-preserving EHR management solution that uses distributed ledger technology and an Identity Mixer (Idemix). The paper describes a proof-of-concept implementation that uses the Hyperledger Fabric's permissioned blockchain framework. The proposed solution is able to store patient records effectively whilst providing anonymity and unlinkability. Experimental performance evaluation results demonstrate the scheme's efficiency and feasibility for real-world scale deployment.
CRSep 10, 2020
Review and Critical Analysis of Privacy-preserving Infection Tracking and Contact TracingWilliam J Buchanan, Muhammad Ali Imran, Masood Ur-Rehman et al.
The outbreak of viruses have necessitated contact tracing and infection tracking methods. Despite various efforts, there is currently no standard scheme for the tracing and tracking. Many nations of the world have therefore, developed their own ways where carriers of disease could be tracked and their contacts traced. These are generalized methods developed either in a distributed manner giving citizens control of their identity or in a centralised manner where a health authority gathers data on those who are carriers. This paper outlines some of the most significant approaches that have been established for contact tracing around the world. A comprehensive review on the key enabling methods used to realise the infrastructure around these infection tracking and contact tracing methods is also presented and recommendations are made for the most effective way to develop such a practice.
CRAug 14, 2020
Privacy Preserving Passive DNSPavlos Papadopoulos, Nikolaos Pitropakis, William J. Buchanan et al.
The Domain Name System (DNS) was created to resolve the IP addresses of the web servers to easily remembered names. When it was initially created, security was not a major concern; nowadays, this lack of inherent security and trust has exposed the global DNS infrastructure to malicious actors. The passive DNS data collection process creates a database containing various DNS data elements, some of which are personal and need to be protected to preserve the privacy of the end users. To this end, we propose the use of distributed ledger technology. We use Hyperledger Fabric to create a permissioned blockchain, which only authorized entities can access. The proposed solution supports queries for storing and retrieving data from the blockchain ledger, allowing the use of the passive DNS database for further analysis, e.g. for the identification of malicious domain names. Additionally, it effectively protects the DNS personal data from unauthorized entities, including the administrators that can act as potential malicious insiders, and allows only the data owners to perform queries over these data. We evaluated our proposed solution by creating a proof-of-concept experimental setup that passively collects DNS data from a network and then uses the distributed ledger technology to store the data in an immutable ledger, thus providing a full historical overview of all the records.
CRJun 3, 2020
A Distributed Trust Framework for Privacy-Preserving Machine LearningWill Abramson, Adam James Hall, Pavlos Papadopoulos et al.
When training a machine learning model, it is standard procedure for the researcher to have full knowledge of both the data and model. However, this engenders a lack of trust between data owners and data scientists. Data owners are justifiably reluctant to relinquish control of private information to third parties. Privacy-preserving techniques distribute computation in order to ensure that data remains in the control of the owner while learning takes place. However, architectures distributed amongst multiple agents introduce an entirely new set of security and trust complications. These include data poisoning and model theft. This paper outlines a distributed infrastructure which is used to facilitate peer-to-peer trust between distributed agents; collaboratively performing a privacy-preserving workflow. Our outlined prototype sets industry gatekeepers and governance bodies as credential issuers. Before participating in the distributed learning workflow, malicious actors must first negotiate valid credentials. We detail a proof of concept using Hyperledger Aries, Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs) to establish a distributed trust architecture during a privacy-preserving machine learning experiment. Specifically, we utilise secure and authenticated DID communication channels in order to facilitate a federated learning workflow related to mental health care data.
CRMay 13, 2020
Phishing URL Detection Through Top-level Domain Analysis: A Descriptive ApproachOrestis Christou, Nikolaos Pitropakis, Pavlos Papadopoulos et al.
Phishing is considered to be one of the most prevalent cyber-attacks because of its immense flexibility and alarmingly high success rate. Even with adequate training and high situational awareness, it can still be hard for users to continually be aware of the URL of the website they are visiting. Traditional detection methods rely on blocklists and content analysis, both of which require time-consuming human verification. Thus, there have been attempts focusing on the predictive filtering of such URLs. This study aims to develop a machine-learning model to detect fraudulent URLs which can be used within the Splunk platform. Inspired from similar approaches in the literature, we trained the SVM and Random Forests algorithms using malicious and benign datasets found in the literature and one dataset that we created. We evaluated the algorithms' performance with precision and recall, reaching up to 85% precision and 87% recall in the case of Random Forests while SVM achieved up to 90% precision and 88% recall using only descriptive features.