CROct 11, 2021

Privacy-Preserving Feature Selection with Fully Homomorphic Encryption

arXiv:2110.05088v37 citations
Originality Synthesis-oriented
AI Analysis

This addresses privacy concerns for semi-honest parties in distributed data settings, though it is incremental as it adapts an existing method to encrypted data.

The paper tackles the problem of privacy-preserving feature selection on distributed datasets by proposing an algorithm based on fully homomorphic encryption, which directly simulates the CWC algorithm on ciphertext and is implemented to show effectiveness on practical data.

For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let $D$, $F$, and $C$ be data, feature, and class sets, respectively, where the feature value $x(F_i)$ and the class label $x(C)$ are given for each $x\in D$ and $F_i \in F$. For a triple $(D,F,C)$, the feature selection problem is to find a consistent and minimal subset $F' \subseteq F$, where `consistent' means that, for any $x,y\in D$, $x(C)=y(C)$ if $x(F_i)=y(F_i)$ for $F_i\in F'$, and `minimal' means that any proper subset of $F'$ is no longer consistent. On distributed datasets, we consider feature selection as a privacy-preserving problem: Assume that semi-honest parties $\textsf A$ and $\textsf B$ have their own personal $D_{\textsf A}$ and $D_{\textsf B}$. The goal is to solve the feature selection problem for $D_{\textsf A}\cup D_{\textsf B}$ without revealing their privacy. In this paper, we propose a secure and efficient algorithm based on fully homomorphic encryption, and we implement our algorithm to show its effectiveness for various practical data. The proposed algorithm is the first one that can directly simulate the CWC (Combination of Weakest Components) algorithm on ciphertext, which is one of the best performers for the feature selection problem on the plaintext.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes