CR AIApr 25

Training Machine Learning Models on Encrypted Data: A Privacy-Preserving Framework using Homomorphic Encryption

Alexandre Marques, Beatriz Sá, Rui Botelho, Pedro Pinto

arXiv:2604.2324527.0

AI Analysis

For ML practitioners needing to train models on sensitive data, this work provides a proof-of-concept that homomorphic encryption can preserve privacy with acceptable accuracy, but it is incremental due to existing methods and limited model scope.

This paper proposes a privacy-preserving framework using homomorphic encryption (CKKS) to train KNN and linear regression models on encrypted data, achieving performance comparable to plaintext models. It also demonstrates encrypted inference for a basic MLP, though computational overhead and noise management remain challenges.

The use of Machine Learning (ML) for data-driven decision-making often relies on access to sensitive datasets, which introduces privacy challenges. Traditional encryption methods protect data at rest or in transit but fail to secure it during processing, exposing it to unauthorized access. Homomorphic encryption emerges as a transformative solution, enabling computations on encrypted data without decryption, thus preserving confidentiality throughout the ML pipeline. This paper addresses the challenge of training ML models on encrypted data while maintaining accuracy and efficiency by proposing a proof-of-concept for a privacy-preserving framework that leverages Cheon-Kim-Kim-Song (CKKS) for approximate real-number arithmetic. Also, it demonstrates the feasibility of training K-Nearest Neighbors (KNN) and linear regression models on encrypted data, and evaluates encrypted inference for a basic Multilayer Perceptron (MLP) architecture. Experimental results show that models trained under Homomorphic encryption achieve performance metrics comparable to plaintext-trained models, validating the approach. However, challenges such as computational overhead, noise management, and limited support for non-polynomial operations persist. This work lays the groundwork for broader adoption of privacy-preserving ML in real-world applications, balancing security with computational feasibility.

View on arXiv PDF

Similar