Isabel Praça

CR
h-index29
35papers
325citations
Novelty23%
AI Score48

35 Papers

LGMay 26
Evaluating Local Explainability Metrics for Machine Learning Models on Tabular Data

Tomás Pereira, João Vitorino, Eva Maia et al.

Despite the wide use of explainability techniques to attempt to understand the behavior of Artificial Intelligence (AI), the generated explanations may not always be reliable. An explanation can appear plausible to humans but fail to capture the internal reasoning of a model, particularly when dealing with complex tabular data. This paper studies the trustworthiness of local explainability techniques when applied to complex tabular classification tasks, considering evaluated metrics for three main properties: faithfulness to the model's predictions, robustness to input data variations, and complexity of the explanation itself. A benchmark was performed for Local Interpretable Model-Agnostic Explanations (LIME), Kernel SHapley Additive exPlanations (SHAP), and Feature Ablation techniques, across 32 datasets and different types of machine learning models. Model performance ranges were analyzed to identify two groups: consensus-correct, which are samples that all models predicted correctly, and consensus-wrong, samples that all models predicted incorrectly. The obtained results demonstrate that that the explanations are not always correlated with a model's predictive performance. Instead, dataset complexity and feature distributions seem to be the main factors affecting explanation quality and reliability.

CRMar 8, 2022
Adaptative Perturbation Patterns: Realistic Adversarial Learning for Robust Intrusion Detection

João Vitorino, Nuno Oliveira, Isabel Praça

Adversarial attacks pose a major threat to machine learning and to the systems that rely on it. In the cybersecurity domain, adversarial cyber-attack examples capable of evading detection are especially concerning. Nonetheless, an example generated for a domain with tabular data must be realistic within that domain. This work establishes the fundamental constraint levels required to achieve realism and introduces the Adaptative Perturbation Pattern Method (A2PM) to fulfill these constraints in a gray-box setting. A2PM relies on pattern sequences that are independently adapted to the characteristics of each class to create valid and coherent data perturbations. The proposed method was evaluated in a cybersecurity case study with two scenarios: Enterprise and Internet of Things (IoT) networks. Multilayer Perceptron (MLP) and Random Forest (RF) classifiers were created with regular and adversarial training, using the CIC-IDS2017 and IoT-23 datasets. In each scenario, targeted and untargeted attacks were performed against the classifiers, and the generated examples were compared with the original network traffic flows to assess their realism. The obtained results demonstrate that A2PM provides a scalable generation of realistic adversarial examples, which can be advantageous for both adversarial training and attacks.

CRJan 30, 2023
Towards Adversarial Realism and Robust Learning for IoT Intrusion Detection and Classification

João Vitorino, Isabel Praça, Eva Maia

The Internet of Things (IoT) faces tremendous security challenges. Machine learning models can be used to tackle the growing number of cyber-attack variations targeting IoT systems, but the increasing threat posed by adversarial attacks restates the need for reliable defense strategies. This work describes the types of constraints required for a realistic adversarial cyber-attack example and proposes a methodology for a trustworthy adversarial robustness analysis with a realistic adversarial evasion attack vector. The proposed methodology was used to evaluate three supervised algorithms, Random Forest (RF), Extreme Gradient Boosting (XGB), and Light Gradient Boosting Machine (LGBM), and one unsupervised algorithm, Isolation Forest (IFOR). Constrained adversarial examples were generated with the Adaptative Perturbation Pattern Method (A2PM), and evasion attacks were performed against models created with regular and adversarial training. Even though RF was the least affected in binary classification, XGB consistently achieved the highest accuracy in multi-class classification. The obtained results evidence the inherent susceptibility of tree-based algorithms and ensembles to adversarial evasion attacks and demonstrates the benefits of adversarial training and a security by design approach for a more robust IoT network intrusion detection and cyber-attack classification.

CRAug 13, 2023
SoK: Realistic Adversarial Attacks and Defenses for Intelligent Network Intrusion Detection

João Vitorino, Isabel Praça, Eva Maia

Machine Learning (ML) can be incredibly valuable to automate anomaly detection and cyber-attack classification, improving the way that Network Intrusion Detection (NID) is performed. However, despite the benefits of ML models, they are highly susceptible to adversarial cyber-attack examples specifically crafted to exploit them. A wide range of adversarial attacks have been created and researchers have worked on various defense strategies to safeguard ML models, but most were not intended for the specific constraints of a communication network and its communication protocols, so they may lead to unrealistic examples in the NID domain. This Systematization of Knowledge (SoK) consolidates and summarizes the state-of-the-art adversarial learning approaches that can generate realistic examples and could be used in real ML development and deployment scenarios with real network traffic flows. This SoK also describes the open challenges regarding the use of adversarial ML in the NID domain, defines the fundamental properties that are required for an adversarial example to be realistic, and provides guidelines for researchers to ensure that their future experiments are adequate for a real communication network.

LGJun 6, 2023
From Data to Action: Exploring AI and IoT-driven Solutions for Smarter Cities

Tiago Dias, Tiago Fonseca, João Vitorino et al.

The emergence of smart cities demands harnessing advanced technologies like the Internet of Things (IoT) and Artificial Intelligence (AI) and promises to unlock cities' potential to become more sustainable, efficient, and ultimately livable for their inhabitants. This work introduces an intelligent city management system that provides a data-driven approach to three use cases: (i) analyze traffic information to reduce the risk of traffic collisions and improve driver and pedestrian safety, (ii) identify when and where energy consumption can be reduced to improve cost savings, and (iii) detect maintenance issues like potholes in the city's roads and sidewalks, as well as the beginning of hazards like floods and fires. A case study in Aveiro City demonstrates the system's effectiveness in generating actionable insights that enhance security, energy efficiency, and sustainability, while highlighting the potential of AI and IoT-driven solutions for smart city development.

SEJul 19, 2024
SCoPE: Evaluating LLMs for Software Vulnerability Detection

José Gonçalves, Tiago Dias, Eva Maia et al.

In recent years, code security has become increasingly important, especially with the rise of interconnected technologies. Detecting vulnerabilities early in the software development process has demonstrated numerous benefits. Consequently, the scientific community started using machine learning for automated detection of source code vulnerabilities. This work explores and refines the CVEFixes dataset, which is commonly used to train models for code-related tasks, specifically the C/C++ subset. To this purpose, the Source Code Processing Engine (SCoPE), a framework composed of strategized techniques that can be used to reduce the size and normalize C/C++ functions is presented. The output generated by SCoPE was used to create a new version of CVEFixes. This refined dataset was then employed in a feature representation analysis to assess the effectiveness of the tool's code processing techniques, consisting of fine-tuning three pre-trained LLMs for software vulnerability detection. The results show that SCoPE successfully helped to identify 905 duplicates within the evaluated subset. The LLM results corroborate with the literature regarding their suitability for software vulnerability detection, with the best model achieving 53% F1-score.

SEMar 14, 2023
Constrained Adversarial Learning for Automated Software Testing: a literature review

João Vitorino, Tiago Dias, Tiago Fonseca et al.

It is imperative to safeguard computer applications and information systems against the growing number of cyber-attacks. Automated software testing tools can be developed to quickly analyze many lines of code and detect vulnerabilities by generating function-specific testing data. This process draws similarities to the constrained adversarial examples generated by adversarial machine learning methods, so there could be significant benefits to the integration of these methods in testing tools to identify possible attack vectors. Therefore, this literature review is focused on the current state-of-the-art of constrained data generation approaches applied for adversarial learning and software testing, aiming to guide researchers and developers to enhance their software testing tools with adversarial testing methods and improve the resilience and robustness of their information systems. The found approaches were systematized, and the advantages and limitations of those specific for white-box, grey-box, and black-box testing were analyzed, identifying research gaps and opportunities to automate the testing tools with data generated by adversarial attacks.

AIJun 27, 2023
Herb-Drug Interactions: A Holistic Decision Support System in Healthcare

Andreia Martins, Eva Maia, Isabel Praça

Complementary and alternative medicine are commonly used concomitantly with conventional medications leading to adverse drug reactions and even fatality in some cases. Furthermore, the vast possibility of herb-drug interactions prevents health professionals from remembering or manually searching them in a database. Decision support systems are a powerful tool that can be used to assist clinicians in making diagnostic and therapeutic decisions in patient care. Therefore, an original and hybrid decision support system was designed to identify herb-drug interactions, applying artificial intelligence techniques to identify new possible interactions. Different machine learning models will be used to strengthen the typical rules engine used in these cases. Thus, using the proposed system, the pharmacy community, people's first line of contact within the Healthcare System, will be able to make better and more accurate therapeutic decisions and mitigate possible adverse events.

SEJun 6, 2023
TestLab: An Intelligent Automated Software Testing Framework

Tiago Dias, Arthur Batista, Eva Maia et al.

The prevalence of software systems has become an integral part of modern-day living. Software usage has increased significantly, leading to its growth in both size and complexity. Consequently, software development is becoming a more time-consuming process. In an attempt to accelerate the development cycle, the testing phase is often neglected, leading to the deployment of flawed systems that can have significant implications on the users daily activities. This work presents TestLab, an intelligent automated software testing framework that attempts to gather a set of testing methods and automate them using Artificial Intelligence to allow continuous testing of software systems at multiple levels from different scopes, ranging from developers to end-users. The tool consists of three modules, each serving a distinct purpose. The first two modules aim to identify vulnerabilities from different perspectives, while the third module enhances traditional automated software testing by automatically generating test cases through source code analysis.

SEJul 19, 2024
FuzzTheREST: An Intelligent Automated Black-box RESTful API Fuzzer

Tiago Dias, Eva Maia, Isabel Praça

Software's pervasive impact and increasing reliance in the era of digital transformation raise concerns about vulnerabilities, emphasizing the need for software security. Fuzzy testing is a dynamic analysis software testing technique that consists of feeding faulty input data to a System Under Test (SUT) and observing its behavior. Specifically regarding black-box RESTful API testing, recent literature has attempted to automate this technique using heuristics to perform the input search and using the HTTP response status codes for classification. However, most approaches do not keep track of code coverage, which is important to validate the solution. This work introduces a black-box RESTful API fuzzy testing tool that employs Reinforcement Learning (RL) for vulnerability detection. The fuzzer operates via the OpenAPI Specification (OAS) file and a scenarios file, which includes information to communicate with the SUT and the sequences of functionalities to test, respectively. To evaluate its effectiveness, the tool was tested on the Petstore API. The tool found a total of six unique vulnerabilities and achieved 55\% code coverage.

AIJul 4, 2022
Deep Learning for Short-term Instant Energy Consumption Forecasting in the Manufacturing Sector

Nuno Oliveira, Norberto Sousa, Isabel Praça

Electricity is a volatile power source that requires great planning and resource management for both short and long term. More specifically, in the short-term, accurate instant energy consumption forecasting contributes greatly to improve the efficiency of buildings, opening new avenues for the adoption of renewable energy. In that regard, data-driven approaches, namely the ones based on machine learning, are begin to be preferred over more traditional ones since they provide not only more simplified ways of deployment but also state of the art results. In that sense, this work applies and compares the performance of several deep learning algorithms, LSTM, CNN, mixed CNN-LSTM and TCN, in a real testbed within the manufacturing sector. The experimental results suggest that the TCN is the most reliable method for predicting instant energy consumption in the short-term.

LGMar 23, 2023
Adversarial Robustness and Feature Impact Analysis for Driver Drowsiness Detection

João Vitorino, Lourenço Rodrigues, Eva Maia et al.

Drowsy driving is a major cause of road accidents, but drivers are dismissive of the impact that fatigue can have on their reaction times. To detect drowsiness before any impairment occurs, a promising strategy is using Machine Learning (ML) to monitor Heart Rate Variability (HRV) signals. This work presents multiple experiments with different HRV time windows and ML models, a feature impact analysis using Shapley Additive Explanations (SHAP), and an adversarial robustness analysis to assess their reliability when processing faulty input data and perturbed HRV signals. The most reliable model was Extreme Gradient Boosting (XGB) and the optimal time window had between 120 and 150 seconds. Furthermore, SHAP enabled the selection of the 18 most impactful features and the training of new smaller models that achieved a performance as good as the initial ones. Despite the susceptibility of all models to adversarial attacks, adversarial training enabled them to preserve significantly higher results, especially XGB. Therefore, ML models can significantly benefit from realistic adversarial training to provide a more robust driver drowsiness detection.

CRSep 1, 2022
A Low-Cost Multi-Agent System for Physical Security in Smart Buildings

Tiago Fonseca, Tiago Dias, João Vitorino et al.

Modern organizations face numerous physical security threats, from fire hazards to more intricate concerns regarding surveillance and unauthorized personnel. Conventional standalone fire and intrusion detection solutions must be installed and maintained independently, which leads to high capital and operational costs. Nonetheless, due to recent developments in smart sensors, computer vision techniques, and wireless communication technologies, these solutions can be integrated in a modular and low-cost manner. This work introduces Integrated Physical Security System (IP2S), a multi-agent system capable of coordinating diverse Internet of Things (IoT) sensors and actuators for an efficient mitigation of multiple physical security events. The proposed system was tested in a live case study that combined fire and intrusion detection in an industrial shop floor environment with four different sectors, two surveillance cameras, and a firefighting robot. The experimental results demonstrate that the integration of several events in a single automated system can be advantageous for the security of smart buildings, reducing false alarms and delays.

CLJun 1, 2022
A Multi-Policy Framework for Deep Learning-Based Fake News Detection

João Vitorino, Tiago Dias, Tiago Fonseca et al.

Connectivity plays an ever-increasing role in modern society, with people all around the world having easy access to rapidly disseminated information. However, a more interconnected society enables the spread of intentionally false information. To mitigate the negative impacts of fake news, it is essential to improve detection methodologies. This work introduces Multi-Policy Statement Checker (MPSC), a framework that automates fake news detection by using deep learning techniques to analyze a statement itself and its related news articles, predicting whether it is seemingly credible or suspicious. The proposed framework was evaluated using four merged datasets containing real and fake news. Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU) and Bidirectional Encoder Representations from Transformers (BERT) models were trained to utilize both lexical and syntactic features, and their performance was evaluated. The obtained results demonstrate that a multi-policy analysis reliably identifies suspicious statements, which can be advantageous for fake news detection.

CRNov 11, 2025
Binary and Multiclass Cyberattack Classification on GeNIS Dataset

Miguel Silva, Daniela Pinto, João Vitorino et al.

The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.

CRNov 11, 2025
Revisiting Network Traffic Analysis: Compatible network flows for ML models

João Vitorino, Daniela Pinto, Eva Maia et al.

To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. However, it can be difficult to accurately represent the complex traffic patterns of an attack, especially in Internet-of-Things (IoT) networks. This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of ML models. In addition to the original CSV files of the Bot-IoT, IoT-23, and CICIoT23 datasets, the raw network packets of their PCAP files were analysed with the HERA tool, generating new labelled flows and extracting consistent features for new CSV versions. To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to fine-tune multiple models. Overall, the results indicate that directly analysing and preprocessing PCAP files, instead of just using the commonly available CSV files, enables the computation of more relevant features to train bagging and gradient boosting decision tree ensembles. It is important to continue improving feature extraction and feature selection processes to make different datasets more compatible and enable a trustworthy evaluation and comparison of the ML models used in cybersecurity solutions.

CRJul 23, 2025Code
MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection

Paulo Mendes, Eva Maia, Isabel Praça

Phishing emails continue to pose a significant threat to cybersecurity by exploiting human vulnerabilities through deceptive content and malicious payloads. While Machine Learning (ML) models are effective at detecting phishing threats, their performance largely relies on the quality and diversity of the training data. This paper presents MeAJOR (Merged email Assets from Joint Open-source Repositories) Corpus, a novel, multi-source phishing email dataset designed to overcome critical limitations in existing resources. It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails, with a wide spectrum of engineered features. We evaluated the dataset's utility for phishing detection research through systematic experiments with four classification models (RF, XGB, MLP, and CNN) across multiple feature configurations. Results highlight the dataset's effectiveness, achieving 98.34% F1 with XGB. By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource, while addressing common challenges like class imbalance, generalisability and reproducibility.

CRApr 5, 2024
Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection

João Vitorino, Miguel Silva, Eva Maia et al.

The growing cybersecurity threats make it essential to use high-quality data to train Machine Learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.

LGMar 10, 2025
Evaluating LLaMA 3.2 for Software Vulnerability Detection

José Gonçalves, Miguel Silva, Bernardo Cabral et al.

Deep Learning (DL) has emerged as a powerful tool for vulnerability detection, often outperforming traditional solutions. However, developing effective DL models requires large amounts of real-world data, which can be difficult to obtain in sufficient quantities. To address this challenge, DiverseVul dataset has been curated as the largest dataset of vulnerable and non-vulnerable C/C++ functions extracted exclusively from real-world projects. Its goal is to provide high-quality, large-scale samples for training DL models. However, during our study several inconsistencies were identified in the raw dataset while applying pre-processing techniques, highlighting the need for a refined version. In this work, we present a refined version of DiverseVul dataset, which is used to fine-tune a large language model, LLaMA 3.2, for vulnerability detection. Experimental results show that the use of pre-processing techniques led to an improvement in performance, with the model achieving an F1-Score of 66%, a competitive result when compared to our baseline, which achieved a 47% F1-Score in software vulnerability detection.

CRFeb 25, 2024
An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection

João Vitorino, Miguel Silva, Eva Maia et al.

As cyber-attacks become more sophisticated, improving the robustness of Machine Learning (ML) models must be a priority for enterprises of all sizes. To reliably compare the robustness of different ML models for cyber-attack detection in enterprise computer networks, they must be evaluated in standardized conditions. This work presents a methodical adversarial robustness benchmark of multiple decision tree ensembles with constrained adversarial examples generated from standard datasets. The robustness of regularly and adversarially trained RF, XGB, LGBM, and EBM models was evaluated on the original CICIDS2017 dataset, a corrected version of it designated as NewCICIDS, and the HIKARI dataset, which contains more recent network traffic. NewCICIDS led to models with a better performance, especially XGB and EBM, but RF and LGBM were less robust against the more recent cyber-attacks of HIKARI. Overall, the robustness of the models to adversarial cyber-attack examples was improved without their generalization to regular traffic being affected, enabling a reliable detection of suspicious activity without costly increases of false alarms.

SEMay 8, 2025
Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

José Gonçalves, Miguel Silva, Eva Maia et al.

The application of Artificial Intelligence has become a powerful approach to detecting software vulnerabilities. However, effective vulnerability detection relies on accurately capturing the semantic structure of code and its contextual relationships. Given that the same functionality can be implemented in various forms, a preprocessing tool that standardizes code representation is important. This tool must be efficient, adaptable across programming languages, and capable of supporting new transformations. To address this challenge, we build on the existing SCoPE framework and introduce SCoPE2, an enhanced version with improved performance. We compare both versions in terms of processing time and memory usage and evaluate their impact on a Large Language Model (LLM) for vulnerability detection. Our results show a 97.3\% reduction in processing time with SCoPE2, along with an improved F1-score for the LLM, solely due to the refined preprocessing approach.

CRDec 18, 2024
Flow Exporter Impact on Intelligent Intrusion Detection Systems

Daniela Pinto, João Vitorino, Eva Maia et al.

High-quality datasets are critical for training machine learning models, as inconsistencies in feature generation can hinder the accuracy and reliability of threat detection. For this reason, ensuring the quality of the data in network intrusion detection datasets is important. A key component of this is using reliable tools to generate the flows and features present in the datasets. This paper investigates the impact of flow exporters on the performance and reliability of machine learning models for intrusion detection. Using HERA, a tool designed to export flows and extract features, the raw network packets of two widely used datasets, UNSW-NB15 and CIC-IDS2017, were processed from PCAP files to generate new versions of these datasets. These were compared to the original ones in terms of their influence on the performance of several models, including Random Forest, XGBoost, LightGBM, and Explainable Boosting Machine. The results obtained were significant. Models trained on the HERA version of the datasets consistently outperformed those trained on the original dataset, showing improvements in accuracy and indicating a better generalisation. This highlighted the importance of flow generation in the model's ability to differentiate between benign and malicious traffic.

LGSep 30, 2025
SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards

João Vitorino, Eva Maia, Isabel Praça et al.

Due to the susceptibility of Artificial Intelligence (AI) to data perturbations and adversarial examples, it is crucial to perform a thorough robustness evaluation before any Machine Learning (ML) model is deployed. However, examining a model's decision boundaries and identifying potential vulnerabilities typically requires access to the training and testing datasets, which may pose risks to data privacy and confidentiality. To improve transparency in organizations that handle confidential data or manage critical infrastructure, it is essential to allow external verification and validation of AI without the disclosure of private datasets. This paper presents Systematic Pattern Analysis (SPATA), a deterministic method that converts any tabular dataset to a domain-independent representation of its statistical patterns, to provide more detailed and transparent data cards. SPATA computes the projection of each data instance into a discrete space where they can be analyzed and compared, without risking data leakage. These projected datasets can be reliably used for the evaluation of how different features affect ML model robustness and for the generation of interpretable explanations of their behavior, contributing to more trustworthy AI.

CRNov 6, 2025
Adversarially Robust and Interpretable Magecart Malware Detection

Pedro Pereira, José Gouveia, João Vitorino et al.

Magecart skimming attacks have emerged as a significant threat to client-side security and user trust in online payment systems. This paper addresses the challenge of achieving robust and explainable detection of Magecart attacks through a comparative study of various Machine Learning (ML) models with a real-world dataset. Tree-based, linear, and kernel-based models were applied, further enhanced through hyperparameter tuning and feature selection, to distinguish between benign and malicious scripts. Such models are supported by a Behavior Deterministic Finite Automaton (DFA) which captures structural behavior patterns in scripts, helping to analyze and classify client-side script execution logs. To ensure robustness against adversarial evasion attacks, the ML models were adversarially trained and evaluated using attacks from the Adversarial Robustness Toolbox and the Adaptative Perturbation Pattern Method. In addition, concise explanations of ML model decisions are provided, supporting transparency and user trust. Experimental validation demonstrated high detection performance and interpretable reasoning, demonstrating that traditional ML models can be effective in real-world web security contexts.

CRNov 11, 2024
Intelligent Green Efficiency for Intrusion Detection

Pedro Pereira, Paulo Mendes, João Vitorino et al.

Artificial Intelligence (AI) has emerged in popularity recently, recording great progress in various industries. However, the environmental impact of AI is a growing concern, in terms of the energy consumption and carbon footprint of Machine Learning (ML) and Deep Learning (DL) models, making essential investigate Green AI, an attempt to reduce the climate impact of AI systems. This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve computation performance of AI focusing on Network Intrusion Detection (NID) and cyber-attack classification tasks. Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory - implemented in four programming languages - Python, Java, R, and Rust - along with three FS methods - Information Gain, Recursive Feature Elimination, and Chi-Square. The obtained results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy, highlighting languages like Python and R, that benefit from a rich AI libraries environment. These conclusions can be useful to design efficient and sustainable AI systems that still provide a good generalization and a reliable detection.

CLJun 12, 2024
Adversarial Evasion Attack Efficiency against Large Language Models

João Vitorino, Eva Maia, Isabel Praça

Large Language Models (LLMs) are valuable for text classification, but their vulnerabilities must not be disregarded. They lack robustness against adversarial examples, so it is pertinent to understand the impacts of different types of perturbations, and assess if those attacks could be replicated by common users with a small amount of perturbations and a small number of queries to a deployed LLM. This work presents an analysis of the effectiveness, efficiency, and practicality of three different types of adversarial attacks against five different LLMs in a sentiment classification task. The obtained results demonstrated the very distinct impacts of the word-level and character-level attacks. The word attacks were more effective, but the character and more constrained attacks were more practical and required a reduced number of perturbations and queries. These differences need to be considered during the development of adversarial defense strategies to train more robust LLMs for intelligent text classification applications.

CRJun 12, 2024
Efficient Network Traffic Feature Sets for IoT Intrusion Detection

Miguel Silva, João Vitorino, Eva Maia et al.

The use of Machine Learning (ML) models in cybersecurity solutions requires high-quality data that is stripped of redundant, missing, and noisy information. By selecting the most relevant features, data integrity and model efficiency can be significantly improved. This work evaluates the feature sets provided by a combination of different feature selection methods, namely Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, in multiple IoT network datasets. The influence of the smaller feature sets on both the classification performance and the training time of ML models is compared, with the aim of increasing the computational efficiency of IoT intrusion detection. Overall, the most impactful features of each dataset were identified, and the ML models obtained higher computational efficiency while preserving a good generalization, showing little to no difference between the sets.

CRMay 3, 2023
Data Privacy with Homomorphic Encryption in Neural Networks Training and Inference

Ivone Amorim, Eva Maia, Pedro Barbosa et al.

The use of Neural Networks (NNs) for sensitive data processing is becoming increasingly popular, raising concerns about data privacy and security. Homomorphic Encryption (HE) has the potential to be used as a solution to preserve data privacy in NN. This study provides a comprehensive analysis on the use of HE for NN training and classification, focusing on the techniques and strategies used to enhance data privacy and security. The current state-of-the-art in HE for NNs is analysed, and the challenges and limitations that need to be addressed to make it a reliable and efficient approach for privacy preservation are identified. Also, the different categories of HE schemes and their suitability for NNs are discussed, as well as the techniques used to optimize the accuracy and efficiency of encrypted models. The review reveals that HE has the potential to provide strong data privacy guarantees for NNs, but several challenges need to be addressed, such as limited support for advanced NN operations, scalability issues, and performance trade-offs.

CRDec 29, 2021
Anomaly Detection in Cyber-Physical Systems: Reconstruction of a Prediction Error Feature Space

Nuno Oliveira, Norberto Sousa, Jorge Oliveira et al.

Cyber-physical systems are infrastructures that use digital information such as network communications and sensor readings to control entities in the physical world. Many cyber-physical systems in airports, hospitals and nuclear power plants are regarded as critical infrastructures since a disruption of its normal functionality can result in negative consequences for the society. In the last few years, some security solutions for cyber-physical systems based on artificial intelligence have been proposed. Nevertheless, knowledge domain is required to properly setup and train artificial intelligence algorithms. Our work proposes a novel anomaly detection framework based on error space reconstruction, where genetic algorithms are used to perform hyperparameter optimization of machine learning methods. The proposed method achieved an F1-score of 87.89% in the SWaT dataset.

CRDec 2, 2021
A tool to support the investigation and visualization of cyber and/or physical incidents

Inês Macedo, Sinan Wanous, Nuno Oliveira et al.

Investigating efficiently the data collected from a system's activity can help to detect malicious attempts and better understand the context behind past incident occurrences. Nowadays, several solutions can be used to monitor system activities to detect probable abnormalities and malfunctions. However, most of these systems overwhelm their users with vast amounts of information, making it harder for them to perceive incident occurrences and their context. Our approach combines a dynamic and intuitive user interface with Machine Learning forecasts to provide an intelligent investigation tool that facilitates the security operator's work. Our system can also act as an enhanced and fully automated decision support mechanism that provides suggestions about possible incident occurrences.

CRNov 25, 2021
A Comparative Analysis of Machine Learning Techniques for IoT Intrusion Detection

João Vitorino, Rui Andrade, Isabel Praça et al.

The digital transformation faces tremendous security challenges. In particular, the growing number of cyber-attacks targeting Internet of Things (IoT) systems restates the need for a reliable detection of malicious network activity. This paper presents a comparative analysis of supervised, unsupervised and reinforcement learning techniques on nine malware captures of the IoT-23 dataset, considering both binary and multi-class classification scenarios. The developed models consisted of Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Isolation Forest (iForest), Local Outlier Factor (LOF) and a Deep Reinforcement Learning (DRL) model based on a Double Deep Q-Network (DDQN), adapted to the intrusion detection context. The most reliable performance was achieved by LightGBM. Nonetheless, iForest displayed good anomaly detection results and the DRL model demonstrated the possible benefits of employing this methodology to continuously improve the detection. Overall, the obtained results indicate that the analyzed techniques are well suited for IoT intrusion detection.

CRNov 19, 2021
A Hybrid Approach for an Interpretable and Explainable Intrusion Detection System

Tiago Dias, Nuno Oliveira, Norberto Sousa et al.

Cybersecurity has been a concern for quite a while now. In the latest years, cyberattacks have been increasing in size and complexity, fueled by significant advances in technology. Nowadays, there is an unavoidable necessity of protecting systems and data crucial for business continuity. Hence, many intrusion detection systems have been created in an attempt to mitigate these threats and contribute to a timelier detection. This work proposes an interpretable and explainable hybrid intrusion detection system, which makes use of artificial intelligence methods to achieve better and more long-lasting security. The system combines experts' written rules and dynamic knowledge continuously generated by a decision tree algorithm as new shreds of evidence emerge from network activity.

ROSep 25, 2021
A Multi-Agent System for Autonomous Mobile Robot Coordination

Norberto Sousa, Nuno Oliveira, Isabel Praça

The automation of internal logistics and inventory-related tasks is one of the main challenges of modern-day manufacturing corporations since it allows a more effective application of their human resources. Nowadays, Autonomous Mobile Robots (AMR) are state of the art technologies for such applications due to their great adaptability in dynamic environments, replacing more traditional solutions such as Automated Guided Vehicles (AGV), which are quite limited in terms of flexibility and require expensive facility updates for their installation. The application of Artificial Intelligence (AI) to increase AMRs capabilities has been contributing for the development of more sophisticated and efficient robots. Nevertheless, multi-robot coordination and cooperation for solving complex tasks is still a hot research line with increasing interest. This work proposes a Multi-Agent System for coordinating multiple TIAGo robots in tasks related to the manufacturing ecosystem such as the transportation and dispatching of raw materials, finished products and tools. Furthermore, the system is showcased in a realistic simulation using both Gazebo and Robot Operating System (ROS).

CRJul 2, 2021
Machine Learning for Network-based Intrusion Detection Systems: an Analysis of the CIDDS-001 Dataset

José Carneiro, Nuno Oliveira, Norberto Sousa et al.

With the increasing amount of reliance on digital data and computer networks by corporations and the public in general, the occurrence of cyber attacks has become a great threat to the normal functioning of our society. Intrusion detection systems seek to address this threat by preemptively detecting attacks in real time while attempting to block them or minimizing their damage. These systems can function in many ways being some of them based on artificial intelligence methods. Datasets containing both normal network traffic and cyber attacks are used for training these algorithms so that they can learn the underlying patterns of network-based data. The CIDDS-001 is one of the most used datasets for network-based intrusion detection research. Regarding this dataset, in the majority of works published so far, the Class label was used for training machine learning algorithms. However, there is another label in the CIDDS-001, AttackType, that seems very promising for this purpose and remains considerably unexplored. This work seeks to make a comparison between two machine learning models, K-Nearest Neighbours and Random Forest, which were trained with both these labels in order to ascertain whether AttackType can produce reliable results in comparison with the Class label.

AIJun 30, 2021
A Search Engine for Scientific Publications: a Cybersecurity Case Study

Nuno Oliveira, Norberto Sousa, Isabel Praça

Cybersecurity is a very challenging topic of research nowadays, as digitalization increases the interaction of people, software and services on the Internet by means of technology devices and networks connected to it. The field is broad and has a lot of unexplored ground under numerous disciplines such as management, psychology, and data science. Its large disciplinary spectrum and many significant research topics generate a considerable amount of information, making it hard for us to find what we are looking for when researching a particular subject. This work proposes a new search engine for scientific publications which combines both information retrieval and reading comprehension algorithms to extract answers from a collection of domain-specific documents. The proposed solution although being applied to the context of cybersecurity exhibited great generalization capabilities and can be easily adapted to perform under other distinct knowledge domains.