CRJun 21, 2022
Transferable Graph Backdoor AttackShuiqiao Yang, Bao Gia Doan, Paul Montague et al. · cambridge
Graph Neural Networks (GNNs) have achieved tremendous success in many graph mining tasks benefitting from the message passing strategy that fuses the local structure and node features for better graph representation learning. Despite the success of GNNs, and similar to other types of deep neural networks, GNNs are found to be vulnerable to unnoticeable perturbations on both graph structure and node features. Many adversarial attacks have been proposed to disclose the fragility of GNNs under different perturbation strategies to create adversarial examples. However, vulnerability of GNNs to successful backdoor attacks was only shown recently. In this paper, we disclose the TRAP attack, a Transferable GRAPh backdoor attack. The core attack principle is to poison the training dataset with perturbation-based triggers that can lead to an effective and transferable backdoor attack. The perturbation trigger for a graph is generated by performing the perturbation actions on the graph structure via a gradient based score matrix from a surrogate model. Compared with prior works, TRAP attack is different in several ways: i) it exploits a surrogate Graph Convolutional Network (GCN) model to generate perturbation triggers for a blackbox based backdoor attack; ii) it generates sample-specific perturbation triggers which do not have a fixed pattern; and iii) the attack transfers, for the first time in the context of GNNs, to different GNN models when trained with the forged poisoned training dataset. Through extensive evaluations on four real-world datasets, we demonstrate the effectiveness of the TRAP attack to build transferable backdoors in four different popular GNNs using four real-world datasets.
CRJul 24, 2024Code
Synthetic Trajectory Generation Through Convolutional Neural NetworksJesse Merhi, Erik Buchholz, Salil S. Kanhere
Location trajectories provide valuable insights for applications from urban planning to pandemic control. However, mobility data can also reveal sensitive information about individuals, such as political opinions, religious beliefs, or sexual orientations. Existing privacy-preserving approaches for publishing this data face a significant utility-privacy trade-off. Releasing synthetic trajectory data generated through deep learning offers a promising solution. Due to the trajectories' sequential nature, most existing models are based on recurrent neural networks (RNNs). However, research in generative adversarial networks (GANs) largely employs convolutional neural networks (CNNs) for image generation. This discrepancy raises the question of whether advances in computer vision can be applied to trajectory generation. In this work, we introduce a Reversible Trajectory-to-CNN Transformation (RTCT) that adapts trajectories into a format suitable for CNN-based models. We integrated this transformation with the well-known DCGAN in a proof-of-concept (PoC) and evaluated its performance against an RNN-based trajectory GAN using four metrics across two datasets. The PoC was superior in capturing spatial distributions compared to the RNN model but had difficulty replicating sequential and temporal properties. Although the PoC's utility is not sufficient for practical applications, the results demonstrate the transformation's potential to facilitate the use of CNNs for trajectory generation, opening up avenues for future research. To support continued research, all source code has been made available under an open-source license.
CVJan 16, 2023
Diverse Multimedia Layout Generation with Multi Choice LearningDavid D. Nguyen, Surya Nepal, Salil S. Kanhere
Designing visually appealing layouts for multimedia documents containing text, graphs and images requires a form of creative intelligence. Modelling the generation of layouts has recently gained attention due to its importance in aesthetics and communication style. In contrast to standard prediction tasks, there are a range of acceptable layouts which depend on user preferences. For example, a poster designer may prefer logos on the top-left while another prefers logos on the bottom-right. Both are correct choices yet existing machine learning models treat layouts as a single choice prediction problem. In such situations, these models would simply average over all possible choices given the same input forming a degenerate sample. In the above example, this would form an unacceptable layout with a logo in the centre. In this paper, we present an auto-regressive neural network architecture, called LayoutMCL, that uses multi-choice prediction and winner-takes-all loss to effectively stabilise layout generation. LayoutMCL avoids the averaging problem by using multiple predictors to learn a range of possible options for each layout object. This enables LayoutMCL to generate multiple and diverse layouts from a single input which is in contrast with existing approaches which yield similar layouts with minor variations. Through quantitative benchmarks on real data (magazine, document and mobile app layouts), we demonstrate that LayoutMCL reduces Fréchet Inception Distance (FID) by 83-98% and generates significantly more diversity in comparison to existing approaches.
SIAug 24, 2023
False Information, Bots and Malicious Campaigns: Demystifying Elements of Social Media ManipulationsMohammad Majid Akhtar, Rahat Masood, Muhammad Ikram et al.
The rapid spread of false information and persistent manipulation attacks on online social networks (OSNs), often for political, ideological, or financial gain, has affected the openness of OSNs. While researchers from various disciplines have investigated different manipulation-triggering elements of OSNs (such as understanding information diffusion on OSNs or detecting automated behavior of accounts), these works have not been consolidated to present a comprehensive overview of the interconnections among these elements. Notably, user psychology, the prevalence of bots, and their tactics in relation to false information detection have been overlooked in previous research. To address this research gap, this paper synthesizes insights from various disciplines to provide a comprehensive analysis of the manipulation landscape. By integrating the primary elements of social media manipulation (SMM), including false information, bots, and malicious campaigns, we extensively examine each SMM element. Through a systematic investigation of prior research, we identify commonalities, highlight existing gaps, and extract valuable insights in the field. Our findings underscore the urgent need for interdisciplinary research to effectively combat social media manipulations, and our systematization can guide future research efforts and assist OSN providers in ensuring the safety and integrity of their platforms.
CRAug 15, 2022
Deception for Cyber Defence: Challenges and OpportunitiesDavid Liebowitz, Surya Nepal, Kristen Moore et al.
Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for scalable, automated generation of realistic deceptions. This vision paper describes the opportunities and challenges involved in developing models to mimic many common elements of the IT stack for deception effects.
CLMar 14, 2022
Can pre-trained Transformers be used in detecting complex sensitive sentences? -- A Monsanto case studyRoelien C. Timmer, David Liebowitz, Surya Nepal et al.
Each and every organisation releases information in a variety of forms ranging from annual reports to legal proceedings. Such documents may contain sensitive information and releasing them openly may lead to the leakage of confidential information. Detection of sentences that contain sensitive information in documents can help organisations prevent the leakage of valuable confidential information. This is especially challenging when such sentences contain a substantial amount of information or are paraphrased versions of known sensitive content. Current approaches to sensitive information detection in such complex settings are based on keyword-based approaches or standard machine learning models. In this paper, we wish to explore whether pre-trained transformer models are well suited to detect complex sensitive information. Pre-trained transformers are typically trained on an enormous amount of text and therefore readily learn grammar, structure and other linguistic features, making them particularly attractive for this task. Through our experiments on the Monsanto trial data set, we observe that the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) transformer model performs better than traditional models. We experimented with four different categories of documents in the Monsanto dataset and observed that BERT achieves better F2 scores by 24.13\% to 65.79\% for GHOST, 30.14\% to 54.88\% for TOXIC, 39.22\% for CHEMI, 53.57\% for REGUL compared to existing sensitive information detection models.
41.5LGApr 2
Label Shift Estimation With Incremental Prior UpdateYunrui Zhang, Gustavo Batista, Salil S. Kanhere
An assumption often made in supervised learning is that the training and testing sets have the same label distribution. However, in real-life scenarios, this assumption rarely holds. For example, medical diagnosis result distributions change over time and across locations; fraud detection models must adapt as patterns of fraudulent activity shift; the category distribution of social media posts changes based on trending topics and user demographics. In the task of label shift estimation, the goal is to estimate the changing label distribution $p_t(y)$ in the testing set, assuming the likelihood $p(x|y)$ does not change, implying no concept drift. In this paper, we propose a new approach for post-hoc label shift estimation, unlike previous methods that perform moment matching with confusion matrix estimated from a validation set or maximize the likelihood of the new data with an expectation-maximization algorithm. We aim to incrementally update the prior on each sample, adjusting each posterior for more accurate label shift estimation. The proposed method is based on intuitive assumptions on classifiers that are generally true for modern probabilistic classifiers. The proposed method relies on a weaker notion of calibration compared to other methods. As a post-hoc approach for label shift estimation, the proposed method is versatile and can be applied to any black-box probabilistic classifier. Experiments on CIFAR-10 and MNIST show that the proposed method consistently outperforms the current state-of-the-art maximum likelihood-based methods under different calibrations and varying intensities of label shift.
SISep 7, 2022
Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake NewsMohammad Majid Akhtar, Bibhas Sharma, Ishan Karunanayake et al.
COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus. Misinformation spread through online social networks (OSN) often misled people from following correct medical practices. In particular, OSN bots have been a primary source of disseminating false information and initiating cyber propaganda. Existing work neglects the presence of bots that act as a catalyst in the spread and focuses on fake news detection in 'articles shared in posts' rather than the post (textual) content. Most work on misinformation detection uses manually labeled datasets that are hard to scale for building their predictive models. In this research, we overcome this challenge of data scarcity by proposing an automated approach for labeling data using verified fact-checked statements on a Twitter dataset. In addition, we combine textual features with user-level features (such as followers count and friends count) and tweet-level features (such as number of mentions, hashtags and urls in a tweet) to act as additional indicators to detect misinformation. Moreover, we analyzed the presence of bots in tweets and show that bots change their behavior over time and are most active during the misinformation campaign. We collected 10.22 Million COVID-19 related tweets and used our annotation model to build an extensive and original ground truth dataset for classification purposes. We utilize various machine learning models to accurately detect misinformation and our best classification model achieves precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot analysis indicates that bots generated approximately 10% of misinformation tweets. Our methodology results in substantial exposure of false information, thus improving the trustworthiness of information disseminated through social media platforms.
LGJul 18, 2023
Discretization-based ensemble model for robust learning in IoTAnahita Namvar, Chandra Thapa, Salil S. Kanhere
IoT device identification is the process of recognizing and verifying connected IoT devices to the network. This is an essential process for ensuring that only authorized devices can access the network, and it is necessary for network management and maintenance. In recent years, machine learning models have been used widely for automating the process of identifying devices in the network. However, these models are vulnerable to adversarial attacks that can compromise their accuracy and effectiveness. To better secure device identification models, discretization techniques enable reduction in the sensitivity of machine learning models to adversarial attacks contributing to the stability and reliability of the model. On the other hand, Ensemble methods combine multiple heterogeneous models to reduce the impact of remaining noise or errors in the model. Therefore, in this paper, we integrate discretization techniques and ensemble methods and examine it on model robustness against adversarial attacks. In other words, we propose a discretization-based ensemble stacking technique to improve the security of our ML models. We evaluate the performance of different ML-based IoT device identification models against white box and black box attacks using a real-world dataset comprised of network traffic from 28 IoT devices. We demonstrate that the proposed method enables robustness to the models for IoT device identification.
CRNov 8, 2023
Local Differential Privacy for Smart Meter Data SharingYashothara Shanmugarasa, M. A. P. Chamikara, Hye-young Paik et al.
Energy disaggregation techniques, which use smart meter data to infer appliance energy usage, can provide consumers and energy companies valuable insights into energy management. However, these techniques also present privacy risks, such as the potential for behavioral profiling. Local differential privacy (LDP) methods provide strong privacy guarantees with high efficiency in addressing privacy concerns. However, existing LDP methods focus on protecting aggregated energy consumption data rather than individual appliances. Furthermore, these methods do not consider the fact that smart meter data are a form of streaming data, and its processing methods should account for time windows. In this paper, we propose a novel LDP approach (named LDP-SmartEnergy) that utilizes randomized response techniques with sliding windows to facilitate the sharing of appliance-level energy consumption data over time while not revealing individual users' appliance usage patterns. Our evaluations show that LDP-SmartEnergy runs efficiently compared to baseline methods. The results also demonstrate that our solution strikes a balance between protecting privacy and maintaining the utility of data for effective analysis.
LGJan 16, 2023
Masked Vector QuantizationDavid D. Nguyen, David Leibowitz, Surya Nepal et al.
Generative models with discrete latent representations have recently demonstrated an impressive ability to learn complex high-dimensional data distributions. However, their performance relies on a long sequence of tokens per instance and a large number of codebook entries, resulting in long sampling times and considerable computation to fit the categorical posterior. To address these issues, we propose the Masked Vector Quantization (MVQ) framework which increases the representational capacity of each code vector by learning mask configurations via a stochastic winner-takes-all training regime called Multiple Hypothese Dropout (MH-Dropout). On ImageNet 64$\times$64, MVQ reduces FID in existing vector quantization architectures by up to $68\%$ at 2 tokens per instance and $57\%$ at 5 tokens. These improvements widen as codebook entries is reduced and allows for $7\textit{--}45\times$ speed-up in token sampling during inference. As an additional benefit, we find that smaller latent spaces lead to MVQ identifying transferable visual representations where multiple can be smoothly combined.
CRSep 23, 2024Code
Demystifying Trajectory Recovery From Ash: An Open-Source Evaluation and EnhancementNicholas D'Silva, Toran Shahi, Øyvind Timian Dokk Husveg et al.
Once analysed, location trajectories can provide valuable insights beneficial to various applications. However, such data is also highly sensitive, rendering them susceptible to privacy risks in the event of mismanagement, for example, revealing an individual's identity, home address, or political affiliations. Hence, ensuring that privacy is preserved for this data is a priority. One commonly taken measure to mitigate this concern is aggregation. Previous work by Xu et al. shows that trajectories are still recoverable from anonymised and aggregated datasets. However, the study lacks implementation details, obfuscating the mechanisms of the attack. Additionally, the attack was evaluated on commercial non-public datasets, rendering the results and subsequent claims unverifiable. This study reimplements the trajectory recovery attack from scratch and evaluates it on two open-source datasets, detailing the preprocessing steps and implementation. Results confirm that privacy leakage still exists despite common anonymisation and aggregation methods but also indicate that the initial accuracy claims may have been overly ambitious. We release all code as open-source to ensure the results are entirely reproducible and, therefore, verifiable. Moreover, we propose a stronger attack by designing a series of enhancements to the baseline attack. These enhancements yield higher accuracies by up to 16%, providing an improved benchmark for future research in trajectory recovery methods. Our improvements also enable online execution of the attack, allowing partial attacks on larger datasets previously considered unprocessable, thereby furthering the extent of privacy leakage. The findings emphasise the importance of using strong privacy-preserving mechanisms when releasing aggregated mobility data and not solely relying on aggregation as a means of anonymisation.
62.0CVApr 21
If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic SystemsJiamin Chang, Minhui Xue, Ruoxi Sun et al.
Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visual injections, overriding user intent and posing security risks. This duality creates a fundamental challenge: agents must respond to legitimate environmental cues while remaining robust to misleading ones. We refer to this tension as trust boundary confusion. To study this behavior, we design a dual-intent dataset and evaluation framework, through which we show that current LVLM-based agents fail to reliably balance this trade-off, either ignoring useful signals or following harmful ones. We systematically evaluate 7 LVLM agents across multiple embodied settings under both structure-based and noise-based visual injections. To address these vulnerabilities, we propose a multi-agent defense framework that separates perception from decision-making to dynamically assess the reliability of visual inputs. Our approach significantly reduces misleading behaviors while preserving correct responses and provides robustness guarantees under adversarial perturbations. The code of the evaluation framework and artifacts are made available at https://anonymous.4open.science/r/Visual-Prompt-Inject.
LGJul 9, 2025Code
Instance-Wise Monotonic Calibration by Constrained TransformationYunrui Zhang, Gustavo Batista, Salil S. Kanhere
Deep neural networks often produce miscalibrated probability estimates, leading to overconfident predictions. A common approach for calibration is fitting a post-hoc calibration map on unseen validation data that transforms predicted probabilities. A key desirable property of the calibration map is instance-wise monotonicity (i.e., preserving the ranking of probability outputs). However, most existing post-hoc calibration methods do not guarantee monotonicity. Previous monotonic approaches either use an under-parameterized calibration map with limited expressive ability or rely on black-box neural networks, which lack interpretability and robustness. In this paper, we propose a family of novel monotonic post-hoc calibration methods, which employs a constrained calibration map parameterized linearly with respect to the number of classes. Our proposed approach ensures expressiveness, robustness, and interpretability while preserving the relative ordering of the probability output by formulating the proposed calibration map as a constrained optimization problem. Our proposed methods achieve state-of-the-art performance across datasets with different deep neural network models, outperforming existing calibration methods while being data and computation-efficient. Our code is available at https://github.com/YunruiZhang/Calibration-by-Constrained-Transformation
LGMar 26, 2025Code
Revisit Time Series Classification Benchmark: The Impact of Temporal Information for ClassificationYunrui Zhang, Gustavo Batista, Salil S. Kanhere
Time series classification is usually regarded as a distinct task from tabular data classification due to the importance of temporal information. However, in this paper, by performing permutation tests that disrupt temporal information on the UCR time series classification archive, the most widely used benchmark for time series classification, we identify a significant proportion of datasets where temporal information has little to no impact on classification. Many of these datasets are tabular in nature or rely mainly on tabular features, leading to potentially biased evaluations of time series classifiers focused on temporal information. To address this, we propose UCR Augmented, a benchmark based on the UCR time series classification archive designed to evaluate classifiers' ability to extract and utilize temporal information. Testing classifiers from seven categories on this benchmark revealed notable shifts in performance rankings. Some previously overlooked approaches perform well, while others see their performance decline significantly when temporal information is crucial. UCR Augmented provides a more robust framework for assessing time series classifiers, ensuring fairer evaluations. Our code is available at https://github.com/YunruiZhang/Revisit-Time-Series-Classification-Benchmark.
LGOct 15, 2024Code
Adversarially Guided Stateful Defense Against Backdoor Attacks in Federated Deep LearningHassan Ali, Surya Nepal, Salil S. Kanhere et al.
Recent works have shown that Federated Learning (FL) is vulnerable to backdoor attacks. Existing defenses cluster submitted updates from clients and select the best cluster for aggregation. However, they often rely on unrealistic assumptions regarding client submissions and sampled clients population while choosing the best cluster. We show that in realistic FL settings, state-of-the-art (SOTA) defenses struggle to perform well against backdoor attacks in FL. To address this, we highlight that backdoored submissions are adversarially biased and overconfident compared to clean submissions. We, therefore, propose an Adversarially Guided Stateful Defense (AGSD) against backdoor attacks on Deep Neural Networks (DNNs) in FL scenarios. AGSD employs adversarial perturbations to a small held-out dataset to compute a novel metric, called the trust index, that guides the cluster selection without relying on any unrealistic assumptions regarding client submissions. Moreover, AGSD maintains a trust state history of each client that adaptively penalizes backdoored clients and rewards clean clients. In realistic FL settings, where SOTA defenses mostly fail to resist attacks, AGSD mostly outperforms all SOTA defenses with minimal drop in clean accuracy (5% in the worst-case compared to best accuracy) even when (a) given a very small held-out dataset -- typically AGSD assumes 50 samples (<= 0.1% of the training data) and (b) no heldout dataset is available, and out-of-distribution data is used instead. For reproducibility, our code will be openly available at: https://github.com/hassanalikhatim/AGSD.
25.5CRMar 19
SoK: Practical Aspects of Releasing Differentially Private GraphsNicholas D'Silva, Surya Nepal, Salil S. Kanhere
Graph data is increasingly prevalent across domains, offering analytical value but raising significant privacy concerns. Edges may encode sensitive relationships, while node attributes may contain sensitive entity or personal data. Differential Privacy (DP) has gained traction for its strong guarantees, yet applying DP to graphs is challenging because of their complex relational structure, leading to trade-offs between privacy and utility. Existing methods vary in privacy definitions, utility goals, and contextual settings, complicating comparison. For practitioners, this is compounded by DP's interpretability issues, contributing to misleading protection claims. To address this, we propose a novel systemisation of existing methods tailored to practical considerations and adaptable to varying practitioner objectives. Our contributions include: (i) a comprehensive survey of differentially private graph release methods; (ii) identification of key vulnerabilities; and (iii) a practitioner-oriented, objective-based framework to guide the selection, interpretation, and sound evaluation of existing methods. We demonstrate the use of our systemisation through two exemplary scenarios in which we assume the role of a social network analyst, apply it, and conduct evaluations in accordance with our framework. Together, these two illustrative instantiations ultimately provide a unified benchmark for state-of-the-art methods in the social networks domain.
CRMar 12, 2024
SoK: Can Trajectory Generation Combine Privacy and Utility?Erik Buchholz, Alsharif Abuadbba, Shuo Wang et al.
While location trajectories represent a valuable data source for analyses and location-based services, they can reveal sensitive information, such as political and religious preferences. Differentially private publication mechanisms have been proposed to allow for analyses under rigorous privacy guarantees. However, the traditional protection schemes suffer from a limiting privacy-utility trade-off and are vulnerable to correlation and reconstruction attacks. Synthetic trajectory data generation and release represent a promising alternative to protection algorithms. While initial proposals achieve remarkable utility, they fail to provide rigorous privacy guarantees. This paper proposes a framework for designing a privacy-preserving trajectory publication approach by defining five design goals, particularly stressing the importance of choosing an appropriate Unit of Privacy. Based on this framework, we briefly discuss the existing trajectory protection approaches, emphasising their shortcomings. This work focuses on the systematisation of the state-of-the-art generative models for trajectories in the context of the proposed framework. We find that no existing solution satisfies all requirements. Thus, we perform an experimental study evaluating the applicability of six sequential generative models to the trajectory domain. Finally, we conclude that a generative trajectory model providing semantic guarantees remains an open research question and propose concrete next steps for future research.
LGApr 7, 2024
Contextual Chart Generation for Cyber DeceptionDavid D. Nguyen, David Liebowitz, Surya Nepal et al.
Honeyfiles are security assets designed to attract and detect intruders on compromised systems. Honeyfiles are a type of honeypot that mimic real, sensitive documents, creating the illusion of the presence of valuable data. Interaction with a honeyfile reveals the presence of an intruder, and can provide insights into their goals and intentions. Their practical use, however, is limited by the time, cost and effort associated with manually creating realistic content. The introduction of large language models has made high-quality text generation accessible, but honeyfiles contain a variety of content including charts, tables and images. This content needs to be plausible and realistic, as well as semantically consistent both within honeyfiles and with the real documents they mimic, to successfully deceive an intruder. In this paper, we focus on an important component of the honeyfile content generation problem: document charts. Charts are ubiquitous in corporate documents and are commonly used to communicate quantitative and scientific data. Existing image generation models, such as DALL-E, are rather prone to generating charts with incomprehensible text and unconvincing data. We take a multi-modal approach to this problem by combining two purpose-built generative models: a multitask Transformer and a specialized multi-head autoencoder. The Transformer generates realistic captions and plot text, while the autoencoder generates the underlying tabular data for the plot. To advance the field of automated honeyplot generation, we also release a new document-chart dataset and propose a novel metric Keyword Semantic Matching (KSM). This metric measures the semantic consistency between keywords of a corpus and a smaller bag of words. Extensive experiments demonstrate excellent performance against multiple large language models, including ChatGPT and GPT4.
SIFeb 6, 2024
BotSSCL: Social Bot Detection with Self-Supervised Contrastive LearningMohammad Majid Akhtar, Navid Shadman Bhuiyan, Rahat Masood et al.
The detection of automated accounts, also known as "social bots", has been an increasingly important concern for online social networks (OSNs). While several methods have been proposed for detecting social bots, significant research gaps remain. First, current models exhibit limitations in detecting sophisticated bots that aim to mimic genuine OSN users. Second, these methods often rely on simplistic profile features, which are susceptible to manipulation. In addition to their vulnerability to adversarial manipulations, these models lack generalizability, resulting in subpar performance when trained on one dataset and tested on another. To address these challenges, we propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL). Our framework leverages contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. The high-level representations derived by BotSSCL enhance its resilience to variations in data distribution and ensure generalizability. We evaluate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection. Experiments on two datasets featuring sophisticated bots demonstrate that BotSSCL outperforms other supervised, unsupervised, and self-supervised baseline methods. We achieve approx. 6% and approx. 8% higher (F1) performance than SOTA on both datasets. In addition, BotSSCL also achieves 67% F1 when trained on one dataset and tested with another, demonstrating its generalizability. Lastly, BotSSCL increases adversarial complexity and only allows 4% success to the adversary in evading detection.
CRAug 1, 2025
Demo: TOSense -- What Did You Just Agree to?Xinzhang Chen, Hassan Ali, Arash Shaghaghi et al.
Online services often require users to agree to lengthy and obscure Terms of Service (ToS), leading to information asymmetry and legal risks. This paper proposes TOSense-a Chrome extension that allows users to ask questions about ToS in natural language and get concise answers in real time. The system combines (i) a crawler "tos-crawl" that automatically extracts ToS content, and (ii) a lightweight large language model pipeline: MiniLM for semantic retrieval and BART-encoder for answer relevance verification. To avoid expensive manual annotation, we present a novel Question Answering Evaluation Pipeline (QEP) that generates synthetic questions and verifies the correctness of answers using clustered topic matching. Experiments on five major platforms, Apple, Google, X (formerly Twitter), Microsoft, and Netflix, show the effectiveness of TOSense (with up to 44.5% accuracy) across varying number of topic clusters. During the demonstration, we will showcase TOSense in action. Attendees will be able to experience seamless extraction, interactive question answering, and instant indexing of new sites.
CRJun 11, 2025
What is the Cost of Differential Privacy for Deep Learning-Based Trajectory Generation?Erik Buchholz, Natasha Fernandes, David D. Nguyen et al.
While location trajectories offer valuable insights, they also reveal sensitive personal information. Differential Privacy (DP) offers formal protection, but achieving a favourable utility-privacy trade-off remains challenging. Recent works explore deep learning-based generative models to produce synthetic trajectories. However, current models lack formal privacy guarantees and rely on conditional information derived from real data during generation. This work investigates the utility cost of enforcing DP in such models, addressing three research questions across two datasets and eleven utility metrics. (1) We evaluate how DP-SGD, the standard DP training method for deep learning, affects the utility of state-of-the-art generative models. (2) Since DP-SGD is limited to unconditional models, we propose a novel DP mechanism for conditional generation that provides formal guarantees and assess its impact on utility. (3) We analyse how model types - Diffusion, VAE, and GAN - affect the utility-privacy trade-off. Our results show that DP-SGD significantly impacts performance, although some utility remains if the datasets is sufficiently large. The proposed DP mechanism improves training stability, particularly when combined with DP-SGD, for unstable models such as GANs and on smaller datasets. Diffusion models yield the best utility without guarantees, but with DP-SGD, GANs perform best, indicating that the best non-private model is not necessarily optimal when targeting formal guarantees. In conclusion, DP trajectory generation remains a challenging task, and formal guarantees are currently only feasible with large datasets and in constrained use cases.
CLNov 16, 2024
Comparison of Multilingual and Bilingual Models for Satirical News Detection of Arabic and EnglishOmar W. Abdalla, Aditya Joshi, Rahat Masood et al.
Satirical news is real news combined with a humorous comment or exaggerated content, and it often mimics the format and style of real news. However, satirical news is often misunderstood as misinformation, especially by individuals from different cultural and social backgrounds. This research addresses the challenge of distinguishing satire from truthful news by leveraging multilingual satire detection methods in English and Arabic. We explore both zero-shot and chain-of-thought (CoT) prompting using two language models, Jais-chat(13B) and LLaMA-2-chat(7B). Our results show that CoT prompting offers a significant advantage for the Jais-chat model over the LLaMA-2-chat model. Specifically, Jais-chat achieved the best performance, with an F1-score of 80\% in English when using CoT prompting. These results highlight the importance of structured reasoning in CoT, which enhances contextual understanding and is vital for complex tasks like satire detection.
CRMay 8, 2024
Honeyfile Camouflage: Hiding Fake Files in Plain SightRoelien C. Timmer, David Liebowitz, Surya Nepal et al.
Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one on clustering with mixture fitting. We evaluate and compare the metrics, showing that both perform well on a publicly available GitHub software repository dataset.
LGDec 18, 2023
Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output DistributionsDavid D. Nguyen, David Liebowitz, Surya Nepal et al.
In many real-world applications, from robotics to pedestrian trajectory prediction, there is a need to predict multiple real-valued outputs to represent several potential scenarios. Current deep learning techniques to address multiple-output problems are based on two main methodologies: (1) mixture density networks, which suffer from poor stability at high dimensions, or (2) multiple choice learning (MCL), an approach that uses $M$ single-output functions, each only producing a point estimate hypothesis. This paper presents a Mixture of Multiple-Output functions (MoM) approach using a novel variant of dropout, Multiple Hypothesis Dropout. Unlike traditional MCL-based approaches, each multiple-output function not only estimates the mean but also the variance for its hypothesis. This is achieved through a novel stochastic winner-take-all loss which allows each multiple-output function to estimate variance through the spread of its subnetwork predictions. Experiments on supervised learning problems illustrate that our approach outperforms existing solutions for reconstructing multimodal output distributions. Additional studies on unsupervised learning problems show that estimating the parameters of latent posterior distributions within a discrete autoencoder significantly improves codebook efficiency, sample quality, precision and recall.
CVMay 30, 2023
DualVAE: Controlling Colours of Generated and Real ImagesKeerth Rathakumar, David Liebowitz, Christian Walder et al.
Colour controlled image generation and manipulation are of interest to artists and graphic designers. Vector Quantised Variational AutoEncoders (VQ-VAEs) with autoregressive (AR) prior are able to produce high quality images, but lack an explicit representation mechanism to control colour attributes. We introduce DualVAE, a hybrid representation model that provides such control by learning disentangled representations for colour and geometry. The geometry is represented by an image intensity mapping that identifies structural features. The disentangled representation is obtained by two novel mechanisms: (i) a dual branch architecture that separates image colour attributes from geometric attributes, and (ii) a new ELBO that trains the combined colour and geometry representations. DualVAE can control the colour of generated images, and recolour existing images by transferring the colour latent representation obtained from an exemplar image. We demonstrate that DualVAE generates images with FID nearly two times better than VQ-GAN on a diverse collection of datasets, including animated faces, logos and artistic landscapes.
CYMay 25, 2023
Transformative Effects of ChatGPT on Modern Education: Emerging Era of AI ChatbotsSukhpal Singh Gill, Minxian Xu, Panos Patros et al.
ChatGPT, an AI-based chatbot, was released to provide coherent and useful replies based on analysis of large volumes of data. In this article, leading scientists, researchers and engineers discuss the transformative effects of ChatGPT on modern education. This research seeks to improve our knowledge of ChatGPT capabilities and its use in the education sector, identifying potential concerns and challenges. Our preliminary evaluation concludes that ChatGPT performed differently in each subject area including finance, coding and maths. While ChatGPT has the ability to help educators by creating instructional content, offering suggestions and acting as an online educator to learners by answering questions and promoting group work, there are clear drawbacks in its use, such as the possibility of producing inaccurate or false data and circumventing duplicate content (plagiarism) detectors where originality is essential. The often reported hallucinations within Generative AI in general, and also relevant for ChatGPT, can render its use of limited benefit where accuracy is essential. What ChatGPT lacks is a stochastic measure to help provide sincere and sensitive communication with its users. Academic regulations and evaluation practices used in educational institutions need to be updated, should ChatGPT be used as a tool in education. To address the transformative effects of ChatGPT on the learning environment, educating teachers and students alike about its capabilities and limitations will be crucial.
CRNov 23, 2021
Is this IoT Device Likely to be Secure? Risk Score Prediction for IoT Devices Using Gradient Boosting MachinesCarlos A. Rivera Alvarez, Arash Shaghaghi, David D. Nguyen et al.
Security risk assessment and prediction are critical for organisations deploying Internet of Things (IoT) devices. An absolute minimum requirement for enterprises is to verify the security risk of IoT devices for the reported vulnerabilities in the National Vulnerability Database (NVD). This paper proposes a novel risk prediction for IoT devices based on publicly available information about them. Our solution provides an easy and cost-efficient solution for enterprises of all sizes to predict the security risk of deploying new IoT devices. After an extensive analysis of the NVD records over the past eight years, we have created a unique, systematic, and balanced dataset for vulnerable IoT devices, including key technical features complemented with functional and descriptive features available from public resources. We then use machine learning classification models such as Gradient Boosting Decision Trees (GBDT) over this dataset and achieve 71% prediction accuracy in classifying the severity of device vulnerability score.
CROct 21, 2021
Decentralised Trustworthy Collaborative Intrusion Detection System for IoTGuntur Dharma Putra, Volkan Dedeoglu, Abhinav Pathak et al.
Intrusion Detection Systems (IDS) have been the industry standard for securing IoT networks against known attacks. To increase the capability of an IDS, researchers proposed the concept of blockchain-based Collaborative-IDS (CIDS), wherein blockchain acts as a decentralised platform allowing collaboration between CIDS nodes to share intrusion related information, such as intrusion alarms and detection rules. However, proposals in blockchain-based CIDS overlook the importance of continuous evaluation of the trustworthiness of each node and generally work based on the assumption that the nodes are always honest. In this paper, we propose a decentralised CIDS that emphasises the importance of building trust between CIDS nodes. In our proposed solution, each CIDS node exchanges detection rules to help other nodes detect new types of intrusion. Our architecture offloads the trust computation to the blockchain and utilises a decentralised storage to host the shared trustworthy detection rules, ensuring scalability. Our implementation in a lab-scale testbed shows that the our solution is feasible and performs within the expected benchmarks of the Ethereum platform.
CRMay 24, 2021
TradeChain: Decoupling Traceability and Identity inBlockchain enabled Supply ChainsSidra Malik, Naman Gupta, Volkan Dedeoglu et al.
In this work, we propose a privacy-preservation framework, TradeChain, which decouples the trade events of participants using decentralised identities. TradeChain adopts the Self-Sovereign Identity (SSI) principles and makes the following novel contributions: a) it incorporates two separate ledgers: a public permissioned blockchain for maintaining identities and the permissioned blockchain for recording trade flows, b) it uses Zero Knowledge Proofs (ZKPs) on traders' private credentials to prove multiple identities on trade ledger and c) allows data owners to define dynamic access rules for verifying traceability information from the trade ledger using access tokens and Ciphertext Policy Attribute-Based Encryption (CP-ABE). A proof of concept implementation of TradeChain is presented on Hyperledger Indy and Fabric and an extensive evaluation of execution time, latency and throughput reveals minimal overheads.
CRMar 10, 2021
DIMY: Enabling Privacy-preserving Contact TracingNadeem Ahmed, Regio A. Michelin, Wanli Xue et al.
The infection rate of COVID-19 and lack of an approved vaccine has forced governments and health authorities to adopt lockdowns, increased testing, and contact tracing to reduce the spread of the virus. Digital contact tracing has become a supplement to the traditional manual contact tracing process. However, although there have been a number of digital contact tracing apps proposed and deployed, these have not been widely adopted owing to apprehensions surrounding privacy and security. In this paper, we propose a blockchain-based privacy-preserving contact tracing protocol, "Did I Meet You" (DIMY), that provides full-lifecycle data privacy protection on the devices themselves as well as on the back-end servers, to address most of the privacy concerns associated with existing protocols. We have employed Bloom filters to provide efficient privacy-preserving storage, and have used the Diffie-Hellman key exchange for secret sharing among the participants. We show that DIMY provides resilience against many well known attacks while introducing negligible overheads. DIMY's footprint on the storage space of clients' devices and back-end servers is also significantly lower than other similar state of the art apps.
LGDec 14, 2020
HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection ScenariosHassan Ali, Surya Nepal, Salil S. Kanhere et al.
We have witnessed the continuing arms race between backdoor attacks and the corresponding defense strategies on Deep Neural Networks (DNNs). Most state-of-the-art defenses rely on the statistical sanitization of the "inputs" or "latent DNN representations" to capture trojan behaviour. In this paper, we first challenge the robustness of such recently reported defenses by introducing a novel variant of targeted backdoor attack, called "low-confidence backdoor attack". We also propose a novel defense technique, called "HaS-Nets". "Low-confidence backdoor attack" exploits the confidence labels assigned to poisoned training samples by giving low values to hide their presence from the defender, both during training and inference. We evaluate the attack against four state-of-the-art defense methods, viz., STRIP, Gradient-Shaping, Februus and ULP-defense, and achieve Attack Success Rate (ASR) of 99%, 63.73%, 91.2% and 80%, respectively. We next present "HaS-Nets" to resist backdoor insertion in the network during training, using a reasonably small healing dataset, approximately 2% to 15% of full training data, to heal the network at each iteration. We evaluate it for different datasets - Fashion-MNIST, CIFAR-10, Consumer Complaint and Urban Sound - and network architectures - MLPs, 2D-CNNs, 1D-CNNs. Our experiments show that "HaS-Nets" can decrease ASRs from over 90% to less than 15%, independent of the dataset, attack configuration and network architecture.
NIOct 26, 2020
Energy and Service-priority aware Trajectory Design for UAV-BSs using Double Q-LearningSayed Amir Hoseini, Ayub Bokani, Jahan Hassan et al.
Next-generation mobile networks have proposed the integration of Unmanned Aerial Vehicles (UAVs) as aerial base stations (UAV-BS) to serve ground nodes. Despite having advantages of using UAV-BSs, their dependence on the on-board, limited-capacity battery hinders their service continuity. Shorter trajectories can save flying energy, however, UAV-BSs must also serve nodes based on their service priority since nodes' service requirements are not always the same. In this paper, we present an energy-efficient trajectory optimization for a UAV assisted IoT system in which the UAV-BS considers the IoT nodes' service priorities in making its movement decisions. We solve the trajectory optimization problem using Double Q-Learning algorithm. Simulation results reveal that the Q-Learning based optimized trajectory outperforms a benchmark algorithm, namely Greedily-served algorithm, in terms of reducing the average energy consumption of the UAV-BS as well as the service delay for high priority nodes.
CROct 23, 2020
Towards Decentralized IoT Updates Delivery Leveraging Blockchain and Zero-Knowledge ProofsEdoardo Puggioni, Arash Shaghaghi, Robin Doss et al.
We propose CrowdPatching, a blockchain-based decentralized protocol, allowing Internet of Things (IoT) manufacturers to delegate the delivery of software updates to self-interested distributors in exchange for cryptocurrency. Manufacturers announce updates by deploying a smart contract (SC), which in turn will issue cryptocurrency payments to any distributor who provides an unforgeable proof-of-delivery. The latter is provided by IoT devices authorizing the SC to issue payment to a distributor when the required conditions are met. These conditions include the requirement for a distributor to generate a zero-knowledge proof, generated with a novel proving system called zk-SNARKs. Compared with related work, CrowdPatching protocol offers three main advantages. First, the number of distributors can scale indefinitely by enabling the addition of new distributors at any time after the initial distribution by manufacturers (i.e., redistribution among the distributor network). The latter is not possible in existing protocols and is not account for. Secondly, we leverage the recent common integration of gateway or Hub in IoT deployments in our protocol to make CrowdPatching feasible even for the more constraint IoT devices. Thirdly, the trustworthiness of distributors is considered in our protocol, rewarding the honest distributors' engagements. We provide both informal and formal security analysis of CrowdPatching using Tamarin Prover.
CRSep 15, 2020
Privacy in Targeted Advertising: A SurveyImdad Ullah, Roksana Boreli, Salil S. Kanhere
Targeted advertising has transformed the marketing landscape for a wide variety of businesses, by creating new opportunities for advertisers to reach prospective customers by delivering personalised ads, using an infrastructure of a number of intermediary entities and technologies. The advertising and analytics companies collect, aggregate, process and trade a vast amount of user's personal data, which has prompted serious privacy concerns among both individuals and organisations. This article presents a detailed survey of the associated privacy risks and proposed solutions in a mobile environment. We outline details of the information flow between the advertising platform and ad/analytics networks, the profiling process, advertising sources and criteria, the measurement analysis of targeted advertising based on user's interests and profiling context and the ads delivery process, for both in-app and in-browser targeted ads; we also include an overview of data sharing and tracking technologies. We discuss challenges in preserving user privacy that include threats related to private information extraction and exchange among various advertising entities, privacy threats from third-party tracking, re-identification of private information and associated privacy risks. Subsequently, we present various techniques for preserving user privacy and a comprehensive analysis of the proposals based on such techniques; we compare the proposals based on the underlying architectures, privacy mechanisms and deployment scenarios. Finally, we discuss the potential research challenges and open research issues.
CRAug 24, 2020
Privacy-preserving targeted mobile advertising: A Blockchain-based framework for mobile adsImdad Ullah, Salil S. Kanhere, Roksana Boreli
The targeted advertising is based on preference profiles inferred via relationships among individuals, their monitored responses to previous advertising and temporal activity over the Internet, which has raised critical privacy concerns. In this paper, we present a novel proposal for a Blockchain-based advertising platform that provides: a system for privacy preserving user profiling, privately requesting ads from the advertising system, the billing mechanisms for presented and clicked ads, the advertising system that uploads ads to the cloud according to profiling interests, various types of transactions to enable advertising operations in Blockchain-based network, and the method that allows a cloud system to privately compute the access policies for various resources (such as ads, mobile user profiles). Our main goal is to design a decentralized framework for targeted ads, which enables private delivery of ads to users whose behavioral profiles accurately match the presented ads, defined by the ad system. We implement a POC of our proposed framework i.e. a Bespoke Miner and experimentally evaluate various components of Blockchain-based in-app advertising system, implementing various critical components; such as, evaluating user profiles, implementing access policies, encryption and decryption of users' profiles. We observe that the processing delay for traversing policies of various tree sizes, the encryption/decryption time of user profiling with various key-sizes and user profiles of various interests evaluates to an acceptable amount of processing time as that of the currently implemented ad systems.
CRJul 20, 2020
B-FERL: Blockchain based Framework for Securing Smart VehiclesChuka Oham, Regio Michelin, Salil S. Kanhere et al.
The ubiquity of connecting technologies in smart vehicles and the incremental automation of its functionalities promise significant benefits, including a significant decline in congestion and road fatalities. However, increasing automation and connectedness broadens the attack surface and heightens the likelihood of a malicious entity successfully executing an attack. In this paper, we propose a Blockchain based Framework for sEcuring smaRt vehicLes (B-FERL). B-FERL uses permissioned blockchain technology to tailor information access to restricted entities in the connected vehicle ecosystem. It also uses a challenge-response data exchange between the vehicles and roadside units to monitor the internal state of the vehicle to identify cases of in-vehicle network compromise. In order to enable authentic and valid communication in the vehicular network, only vehicles with a verifiable record in the blockchain can exchange messages. Through qualitative arguments, we show that B-FERL is resilient to identified attacks. Also, quantitative evaluations in an emulated scenario show that B-FERL ensures a suitable response time and required storage size compatible with realistic scenarios. Finally, we demonstrate how B-FERL achieves various important functions relevant to the automotive ecosystem such as trust management, vehicular forensics and secure vehicular networks.
CRJun 18, 2020
A Survey of COVID-19 Contact Tracing AppsNadeem Ahmed, Regio A. Michelin, Wanli Xue et al.
The recent outbreak of COVID-19 has taken the world by surprise, forcing lockdowns and straining public health care systems. COVID-19 is known to be a highly infectious virus, and infected individuals do not initially exhibit symptoms, while some remain asymptomatic. Thus, a non-negligible fraction of the population can, at any given time, be a hidden source of transmissions. In response, many governments have shown great interest in smartphone contact tracing apps that help automate the difficult task of tracing all recent contacts of newly identified infected individuals. However, tracing apps have generated much discussion around their key attributes, including system architecture, data management, privacy, security, proximity estimation, and attack vulnerability. In this article, we provide the first comprehensive review of these much-discussed tracing app attributes. We also present an overview of many proposed tracing app examples, some of which have been deployed countrywide, and discuss the concerns users have reported regarding their usage. We close by outlining potential research directions for next-generation app design, which would facilitate improved tracing and security performance, as well as wide adoption by the population at large.
CRMay 2, 2020
Context-based smart contracts for appendable-block blockchainsHenry C. Nunes, Roben C. Lunardi, Avelin F. Zorzo et al.
Currently, blockchain proposals are being adopted to solve security issues, such as data integrity, resilience, and non-repudiation. To improve certain aspects, e.g., energy consumption and latency, of traditional blockchains, different architectures, algorithms, and data management methods have been recently proposed. For example, appendable-block blockchain uses a different data structure designed to reduce latency in block and transaction insertion. It is especially applicable in domains such as Internet of Things (IoT), where both latency and energy are key concerns. However, the lack of some features available to other blockchains, such as Smart Contracts, limits the application of this model. To solve this, in this work, we propose the use of Smart Contracts in appendable-block blockchain through a new model called context-based appendable-block blockchain. This model also allows the execution of multiple smart contracts in parallel, featuring high performance in parallel computing scenarios. Furthermore, we present an implementation for the context-based appendable-block blockchain using an Ethereum Virtual Machine (EVM). Finally, we execute this implementation in four different testbed. The results demonstrated a performance improvement for parallel processing of smart contracts when using the proposed model.
CRApr 4, 2020
Attacking with bitcoin: Using Bitcoin to Build Resilient Botnet ArmiesDimitri Kamenski, Arash Shaghaghi, Matthew Warren et al.
We focus on the problem of botnet orchestration and discuss how attackers can leverage decentralised technologies to dynamically control botnets with the goal of having botnets that are resilient against hostile takeovers. We cover critical elements of the Bitcoin blockchain and its usage for `floating command and control servers'. We further discuss how blockchain-based botnets can be built and include a detailed discussion of our implementation. We also showcase how specific Bitcoin APIs can be used in order to write extraneous data to the blockchain. Finally, while in this paper, we use Bitcoin to build our resilient botnet proof of concept, the threat is not limited to Bitcoin blockchain and can be generalized.
CRFeb 6, 2020
Energy-aware Demand Selection and Allocation for Real-time IoT Data TradingPooja Gupta, Volkan Dedeoglu, Kamran Najeebullah et al.
Personal IoT data is a new economic asset that individuals can trade to generate revenue on the emerging data marketplaces. Typically, marketplaces are centralized systems that raise concerns of privacy, single point of failure, little transparency and involve trusted intermediaries to be fair. Furthermore, the battery-operated IoT devices limit the amount of IoT data to be traded in real-time that affects buyer/seller satisfaction and hence, impacting the sustainability and usability of such a marketplace. This work proposes to utilize blockchain technology to realize a trusted and transparent decentralized marketplace for contract compliance for trading IoT data streams generated by battery-operated IoT devices in real-time. The contribution of this paper is two-fold: (1) we propose an autonomous blockchain-based marketplace equipped with essential functionalities such as agreement framework, pricing model and rating mechanism to create an effective marketplace framework without involving a mediator, (2) we propose a mechanism for selection and allocation of buyers' demands on seller's devices under quality and battery constraints. We present a proof-of-concept implementation in Ethereum to demonstrate the feasibility of the framework. We investigated the impact of buyer's demand on the battery drainage of the IoT devices under different scenarios through extensive simulations. Our results show that this approach is viable and benefits the seller and buyer for creating a sustainable marketplace model for trading IoT data in real-time from battery-powered IoT devices.
CRDec 23, 2019
Leveraging lightweight blockchain to establish data integrity for surveillance camerasRegio A. Michelin, Nadeem Ahmed, Salil S. Kanhere et al.
The video footage produced by the surveillance cameras is an important evidence to support criminal investigations. Video evidence can be sourced from public (trusted) as well as private (untrusted) surveillance systems. This raises the issue of establishing integrity and auditability for information provided by the untrusted video sources. In this paper, we focus on a airport ecosystem, where multiple entities with varying levels of trust are involved in producing and exchanging video surveillance information. We present a framework to ensure the data integrity of the stored videos, allowing authorities to validate whether video footage has not been tampered. Our proposal uses a lightweight blockchain technology to store the video metadata as blockchain transactions to support the validation of video integrity. The proposed framework also ensures video auditability and non-repudiation. Our evaluations show that the overhead introduced by employing the blockchain to create and query the transactions introduces a very minor latency of a few milliseconds.
CRDec 23, 2019
Impact of consensus on appendable-block blockchain for IoTRoben C. Lunardi, Regio A. Michelin, Charles V. Neu et al.
The Internet of Things (IoT) is transforming our physical world into a complex and dynamic system of connected devices on an unprecedented scale. Connecting everyday physical objects is creating new business models, improving processes and reducing costs and risks. Recently, blockchain technology has received a lot of attention from the community as a possible solution to overcome security issues in IoT. However, traditional blockchains (such as the ones used in Bitcoin and Ethereum) are not well suited to the resource-constrained nature of IoT devices and also with the large volume of information that is expected to be generated from typical IoT deployments. To overcome these issues, several researchers have presented lightweight instances of blockchains tailored for IoT. For example, proposing novel data structures based on blocks with decoupled and appendable data. However, these researchers did not discuss how the consensus algorithm would impact their solutions, i.e., the decision of which consensus algorithm would be better suited was left as an open issue. In this paper, we improved an appendable-block blockchain framework to support different consensus algorithms through a modular design. We evaluated the performance of this improved version in different emulated scenarios and studied the impact of varying the number of devices and transactions and employing different consensus algorithms. Even adopting different consensus algorithms, results indicate that the latency to append a new block is less than 161ms (in the more demanding scenario) and the delay for processing a new transaction is less than 7ms, suggesting that our improved version of the appendable-block blockchain is efficient and scalable, and thus well suited for IoT scenarios.
CRDec 21, 2019
Trust Management in Decentralized IoT Access Control SystemGuntur Dharma Putra, Volkan Dedeoglu, Salil S. Kanhere et al.
Heterogeneous and dynamic IoT environments require a lightweight, scalable, and trustworthy access control system for protection from unauthorized access and for automated detection of compromised nodes. Recent proposals in IoT access control systems have incorporated blockchain to overcome inherent issues in conventional access control schemes. However, the dynamic interaction of IoT networks remains uncaptured. Here, we develop a blockchain based Trust and Reputation System (TRS) for IoT access control, which progressively evaluates and calculates the trust and reputation score of each participating node to achieve a self-adaptive and trustworthy access control system. Trust and reputation are explicitly incorporated in the attribute-based access control policy, so that different nodes can be assigned to different access right levels, resulting in dynamic access control policies. We implement our proposed architecture in a private Ethereum blockchain comprised of a Docker container network. We benchmark our solution using various performance metrics to highlight its applicability for IoT contexts.
CRDec 3, 2019
A journey in applying blockchain for cyberphysical systemsVolkan Dedeoglu, Ali Dorri, Raja Jurdak et al.
Cyberphysical Systems (CPS) are transforming the way we interact with the physical world around us. However, centralised approaches for CPS systems are not capable of addressing the unique challenges of CPS due to the complexity, constraints, and dynamic nature of the interactions. To realize the true potential of CPS, a decentralized approach that takes into account these unique features is required. Recently, blockchain-based solutions have been proposed to address CPS challenges.Yet, applying blockchain for diverse CPS domains is not straight-forward and has its own challenges. In this paper, we share our experiences in applying blockchain technology for CPS to provide insights and highlight the challenges and future opportunities.
CRJun 27, 2019
A Trust Architecture for Blockchain in IoTVolkan Dedeoglu, Raja Jurdak, Guntur D. Putra et al.
Blockchain is a promising technology for establishing trust in IoT networks, where network nodes do not necessarily trust each other. Cryptographic hash links and distributed consensus mechanisms ensure that the data stored on an immutable blockchain can not be altered or deleted. However, blockchain mechanisms do not guarantee the trustworthiness of data at the origin. We propose a layered architecture for improving the end-to-end trust that can be applied to a diverse range of blockchain-based IoT applications. Our architecture evaluates the trustworthiness of sensor observations at the data layer and adapts block verification at the blockchain layer through the proposed data trust and gateway reputation modules. We present the performance evaluation of the data trust module using a simulated indoor target localization and the gateway reputation module using an end-to-end blockchain implementation, together with a qualitative security analysis for the architecture.
CRJun 5, 2019
TrustChain: Trust Management in Blockchain and IoT supported Supply ChainsSidra Malik, Volkan Dedeoglu, Salil S. Kanhere et al.
Traceability and integrity are major challenges for the increasingly complex supply chains of today's world. Although blockchain technology has the potential to address these challenges through providing a tamper-proof audit trail of supply chain events and data associated with a product life-cycle, it does not solve the trust problem associated with the data itself. Reputation systems are an effective approach to solve this trust problem. However, current reputation systems are not suited to the blockchain based supply chain applications as they are based on limited observations, they lack granularity and automation, and their overhead has not been explored. In this work, we propose TrustChain, as a three-layered trust management framework which uses a consortium blockchain to track interactions among supply chain participants and to dynamically assign trust and reputation scores based on these interactions. The novelty of TrustChain stems from: (a) the reputation model that evaluates the quality of commodities, and the trustworthiness of entities based on multiple observations of supply chain events, (b) its support for reputation scores that separate between a supply chain participant and products, enabling the assignment of product-specific reputations for the same participant, (c) the use of smart contracts for transparent, efficient, secure, and automated calculation of reputation scores, and (d) its minimal overhead in terms of latency and throughput when compared to a simple blockchain based supply chain model.
CRNov 6, 2018
Blockchain based Proxy Re-Encryption Scheme for Secure IoT Data SharingAhsan Manzoor, Madhsanka Liyanage, An Braeken et al.
Data is central to the Internet of Things (IoT) ecosystem. Most of the current IoT systems are using centralized cloud-based data sharing systems, which will be difficult to scale up to meet the demands of future IoT systems. Involvement of such third-party service provider requires also trust from both sensor owner and sensor data user. Moreover, the fees need to be paid for their services. To tackle both the scalability and trust issues and to automatize the payments, this paper presents a blockchain based proxy re-encryption scheme. The system stores the IoT data in a distributed cloud after encryption. To share the collected IoT data, the system establishes runtime dynamic smart contracts between the sensor and data user without the involvement of a trusted third party. It also uses a very efficient proxy re-encryption scheme which allows that the data is only visible by the owner and the person present in the smart contract. This novel combination of smart contracts with proxy re-encryption provides an efficient, fast and secure platform for storing, trading and managing of sensor data. The proposed system is implemented in an Ethereum based testbed to analyze the performance and the security properties.
HCOct 4, 2018
Brain2Object: Printing Your Mind from Brain Signals with Spatial Correlation EmbeddingXiang Zhang, Lina Yao, Chaoran Huang et al.
Electroencephalography (EEG) signals are known to manifest differential patterns when individuals visually concentrate on different objects. In this work, we present an end-to-end digital fabrication system, Brain2Object, to print the 3D object that an individual is observing by decoding visually-evoked brain signals. We propose a unified training framework that combines multi-class Common Spatial Pattern and Convolutional Neural Networks to support the backend computation. We learn the dynamical graph representations of brain signals to accurately capture the structural information among EEG channels. A user-friendly interface is developed as the system front end. Brain2Object presents a streamlined end-to-end workflow that can serve as a template for deeper integration of BCI technologies to assist with our routine activities. The proposed system is evaluated extensively using offline experiments and through an online demonstrator. The experimental results show that our approach can achieve the recognition accuracy of 92.58% on a benchmark dataset and 75.23% on a locally collected dataset. Moreover, our method consistently outperforms a wide range of baseline and state-of-the-art approaches. The proof-of-concept corroborates the practicality of our approach and illustrates the ease with which such a system could be deployed.
CRSep 19, 2018
Gwardar: Towards Protecting a Software-Defined Network from Malicious Network Operating SystemsArash Shaghaghi, Salil S. Kanhere, Mohamed Ali Kaafar et al.
A Software-Defined Network (SDN) controller (aka. Network Operating System or NOS) is regarded as the brain of the network and is the single most critical element responsible to manage an SDN. Complimentary to existing solutions that aim to protect a NOS, we propose an intrusion protection system designed to protect an SDN against a controller that has been successfully compromised. Gwardar maintains a virtual replica of the data plane by intercepting the OpenFlow messages exchanged between the control and data plane. By observing the long-term flow of the packets, Gwardar learns the normal set of trajectories in the data plane for distinct packet headers. Upon detecting an unexpected packet trajectory, it starts by verifying the data plane forwarding devices by comparing the actual packet trajectories with the expected ones computed over the virtual replica. If the anomalous trajectories match the NOS instructions, Gwardar inspects the NOS itself. For this, it submits policies matching the normal set of trajectories and verifies whether the controller submits matching flow rules to the data plane and whether the network view provided to the application plane reflects the changes. Our evaluation results prove the practicality of Gwardar with a high detection accuracy in a reasonable time-frame.