Privacy Preserving K-Means Clustering: A Secure Multi-Party Computation Approach
This addresses privacy concerns for organizations or individuals with sensitive data in distributed databases, though it appears incremental as it applies existing cryptographic protocols to K-means.
The paper tackles the problem of performing K-means clustering on private data distributed across multiple sources without compromising privacy, using secure multi-party computation to enable knowledge discovery while ensuring data confidentiality.
Knowledge discovery is one of the main goals of Artificial Intelligence. This Knowledge is usually stored in databases spread in different environments, being a tedious (or impossible) task to access and extract data from them. To this difficulty we must add that these datasources may contain private data, therefore the information can never leave the source. Privacy Preserving Machine Learning (PPML) helps to overcome this difficulty, employing cryptographic techniques, allowing knowledge discovery while ensuring data privacy. K-means is one of the data mining techniques used in order to discover knowledge, grouping data points in clusters that contain similar features. This paper focuses in Privacy Preserving Machine Learning applied to K-means using recent protocols from the field of criptography. The algorithm is applied to different scenarios where data may be distributed either horizontally or vertically.