Ivone Amorim

CR
h-index29
5papers
1citation
Novelty21%
AI Score34

5 Papers

CRNov 11, 2025
Binary and Multiclass Cyberattack Classification on GeNIS Dataset

Miguel Silva, Daniela Pinto, João Vitorino et al.

The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.

CRNov 11, 2025
Revisiting Network Traffic Analysis: Compatible network flows for ML models

João Vitorino, Daniela Pinto, Eva Maia et al.

To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. However, it can be difficult to accurately represent the complex traffic patterns of an attack, especially in Internet-of-Things (IoT) networks. This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of ML models. In addition to the original CSV files of the Bot-IoT, IoT-23, and CICIoT23 datasets, the raw network packets of their PCAP files were analysed with the HERA tool, generating new labelled flows and extracting consistent features for new CSV versions. To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to fine-tune multiple models. Overall, the results indicate that directly analysing and preprocessing PCAP files, instead of just using the commonly available CSV files, enables the computation of more relevant features to train bagging and gradient boosting decision tree ensembles. It is important to continue improving feature extraction and feature selection processes to make different datasets more compatible and enable a trustworthy evaluation and comparison of the ML models used in cybersecurity solutions.

16.6CRMar 27
Towards Privacy-Preserving Federated Learning using Hybrid Homomorphic Encryption

Ivan Costa, Pedro Correia, Ivone Amorim et al.

Federated Learning (FL) enables collaborative training while keeping sensitive data on clients' devices, but local model updates can still leak private information. Hybrid Homomorphic Encryption (HHE) has recently been applied to FL to mitigate client overhead while preserving privacy. However, existing HHE-FL systems rely on a single homomorphic key pair shared across all clients, which forces them to assume an unrealistically weak threat model: if a client misbehaves or intercepts another's traffic, private updates can be exposed. We eliminate this weakness by integrating two alternative key protection mechanisms into the HHE-FL workflow. The first is masking, where client keys are blinded before homomorphic encryption and later unblinded homomorphically by the server. The second is RSA encapsulation, where homomorphically encrypted keys are additionally wrapped under the server's RSA public key. These countermeasures prevent key misuse by other clients and extend HHE-FL security to adversarial settings with malicious participants. We implement both approaches on top of the Flower framework using the PASTA/BFV HHE scheme and evaluate them on the MNIST dataset with 12 clients. Results show that both mechanisms preserve model accuracy while adding minimal overhead: masking incurs negligible cost, and RSA encapsulation introduces only modest runtime and communication overhead.

CRDec 18, 2024
Flow Exporter Impact on Intelligent Intrusion Detection Systems

Daniela Pinto, João Vitorino, Eva Maia et al.

High-quality datasets are critical for training machine learning models, as inconsistencies in feature generation can hinder the accuracy and reliability of threat detection. For this reason, ensuring the quality of the data in network intrusion detection datasets is important. A key component of this is using reliable tools to generate the flows and features present in the datasets. This paper investigates the impact of flow exporters on the performance and reliability of machine learning models for intrusion detection. Using HERA, a tool designed to export flows and extract features, the raw network packets of two widely used datasets, UNSW-NB15 and CIC-IDS2017, were processed from PCAP files to generate new versions of these datasets. These were compared to the original ones in terms of their influence on the performance of several models, including Random Forest, XGBoost, LightGBM, and Explainable Boosting Machine. The results obtained were significant. Models trained on the HERA version of the datasets consistently outperformed those trained on the original dataset, showing improvements in accuracy and indicating a better generalisation. This highlighted the importance of flow generation in the model's ability to differentiate between benign and malicious traffic.

CRMay 3, 2023
Data Privacy with Homomorphic Encryption in Neural Networks Training and Inference

Ivone Amorim, Eva Maia, Pedro Barbosa et al.

The use of Neural Networks (NNs) for sensitive data processing is becoming increasingly popular, raising concerns about data privacy and security. Homomorphic Encryption (HE) has the potential to be used as a solution to preserve data privacy in NN. This study provides a comprehensive analysis on the use of HE for NN training and classification, focusing on the techniques and strategies used to enhance data privacy and security. The current state-of-the-art in HE for NNs is analysed, and the challenges and limitations that need to be addressed to make it a reliable and efficient approach for privacy preservation are identified. Also, the different categories of HE schemes and their suitability for NNs are discussed, as well as the techniques used to optimize the accuracy and efficiency of encrypted models. The review reveals that HE has the potential to provide strong data privacy guarantees for NNs, but several challenges need to be addressed, such as limited support for advanced NN operations, scalability issues, and performance trade-offs.