Isabel PraÃ§a

3.4CRMar 27

Machine Learning Transferability for Malware Detection

César Vieira, João Vitorino, Eva Maia et al.

Malware continues to be a predominant operational risk for organizations, especially when obfuscation techniques are used to evade detection. Despite the ongoing efforts in the development of Machine Learning (ML) detection approaches, there is still a lack of feature compatibility in public datasets. This limits generalization when facing distribution shifts, as well as transferability to different datasets. This study evaluates the suitability of different data preprocessing approaches for the detection of Portable Executable (PE) files with ML models. The preprocessing pipeline unifies EMBERv2 (2,381-dim) features datasets, trains paired models under two training setups: EMBER + BODMAS and EMBER + BODMAS + ERMDS. Regarding model evaluation, both EMBER + BODMAS and EMBER + BODMAS + ERMDS models are tested against TRITIUM, INFERNO and SOREL-20M. ERMDS is also used for testing for the EMBER + BODMAS setup.

2.7CRMar 27

Towards Privacy-Preserving Federated Learning using Hybrid Homomorphic Encryption

Ivan Costa, Pedro Correia, Ivone Amorim et al.

Federated Learning (FL) enables collaborative training while keeping sensitive data on clients' devices, but local model updates can still leak private information. Hybrid Homomorphic Encryption (HHE) has recently been applied to FL to mitigate client overhead while preserving privacy. However, existing HHE-FL systems rely on a single homomorphic key pair shared across all clients, which forces them to assume an unrealistically weak threat model: if a client misbehaves or intercepts another's traffic, private updates can be exposed. We eliminate this weakness by integrating two alternative key protection mechanisms into the HHE-FL workflow. The first is masking, where client keys are blinded before homomorphic encryption and later unblinded homomorphically by the server. The second is RSA encapsulation, where homomorphically encrypted keys are additionally wrapped under the server's RSA public key. These countermeasures prevent key misuse by other clients and extend HHE-FL security to adversarial settings with malicious participants. We implement both approaches on top of the Flower framework using the PASTA/BFV HHE scheme and evaluate them on the MNIST dataset with 12 clients. Results show that both mechanisms preserve model accuracy while adding minimal overhead: masking incurs negligible cost, and RSA encapsulation introduces only modest runtime and communication overhead.

Isabel PraÃ§a

2 Papers