10.4CRMar 18
DDH-based schemes for multi-party Function Secret SharingMarc Damie, Florian Hahn, Andreas Peter et al.
Function Secret Sharing (FSS) schemes enable sharing efficiently secret functions. Schemes dedicated to point functions, referred to as Distributed Point Functions (DPFs), are the center of FSS literature thanks to their numerous applications including private information retrieval, anonymous communications, and machine learning. While two-party DPFs benefit from schemes with logarithmic key sizes, multi-party DPFs have seen limited advancements: $O(\sqrt{N})$ key sizes (with $N$, the function domain size) and/or exponential factors in the key size. We propose a DDH-based technique reducing the key size of existing multi-party schemes. In particular, we build an honest-majority DPF with $O(\sqrt[3]{N})$ key size. Our benchmark highlights key sizes up to $10\times$ smaller (on realistic problem sizes) than state-of-the-art schemes. Finally, we extend our technique to schemes supporting comparison functions.
CRFeb 26, 2025
Evaluating Membership Inference Attacks in heterogeneous-data setupsBram van Dartel, Marc Damie, Florian Hahn
Among all privacy attacks against Machine Learning (ML), membership inference attacks (MIA) attracted the most attention. In these attacks, the attacker is given an ML model and a data point, and they must infer whether the data point was used for training. The attacker also has an auxiliary dataset to tune their inference algorithm. Attack papers commonly simulate setups in which the attacker's and the target's datasets are sampled from the same distribution. This setting is convenient to perform experiments, but it rarely holds in practice. ML literature commonly starts with similar simplifying assumptions (i.e., "i.i.d." datasets), and later generalizes the results to support heterogeneous data distributions. Similarly, our work makes a first step in the generalization of the MIA evaluation to heterogeneous data. First, we design a metric to measure the heterogeneity between any pair of tabular data distributions. This metric provides a continuous scale to analyze the phenomenon. Second, we compare two methodologies to simulate a data heterogeneity between the target and the attacker. These setups provide opposite performances: 90% attack accuracy vs. 50% (i.e., random guessing). Our results show that the MIA accuracy depends on the experimental setup; and even if research on MIA considers heterogeneous data setups, we have no standardized baseline of how to simulate it. The lack of such a baseline for MIA experiments poses a significant challenge to risk assessments in real-world machine learning scenarios.
CROct 16, 2025
Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine LearningMarc Damie, Florian Hahn, Andreas Peter et al.
To preserve privacy, multi-party computation (MPC) enables executing Machine Learning (ML) algorithms on secret-shared or encrypted data. However, existing MPC frameworks are not optimized for sparse data. This makes them unsuitable for ML applications involving sparse data, e.g., recommender systems or genomics. Even in plaintext, such applications involve high-dimensional sparse data, that cannot be processed without sparsity-related optimizations due to prohibitively large memory requirements. Since matrix multiplication is central in ML algorithms, we propose MPC algorithms to multiply secret sparse matrices. On the one hand, our algorithms avoid the memory issues of the "dense" data representation of classic secure matrix multiplication algorithms. On the other hand, our algorithms can significantly reduce communication costs (some experiments show a factor 1000) for realistic problem sizes. We validate our algorithms in two ML applications in which existing protocols are impractical. An important question when developing MPC algorithms is what assumptions can be made. In our case, if the number of non-zeros in a row is a sensitive piece of information then a short runtime may reveal that the number of non-zeros is small. Existing approaches make relatively simple assumptions, e.g., that there is a universal upper bound to the number of non-zeros in a row. This often doesn't align with statistical reality, in a lot of sparse datasets the amount of data per instance satisfies a power law. We propose an approach which allows adopting a safe upper bound on the distribution of non-zeros in rows/columns of sparse matrices.
CRJul 2, 2025
How to Securely Shuffle? A survey about Secure Shufflers for privacy-preserving computationsMarc Damie, Florian Hahn, Andreas Peter et al.
Ishai et al. (FOCS'06) introduced secure shuffling as an efficient building block for private data aggregation. Recently, the field of differential privacy has revived interest in secure shufflers by highlighting the privacy amplification they can provide in various computations. Although several works argue for the utility of secure shufflers, they often treat them as black boxes; overlooking the practical vulnerabilities and performance trade-offs of existing implementations. This leaves a central question open: what makes a good secure shuffler? This survey addresses that question by identifying, categorizing, and comparing 26 secure protocols that realize the necessary shuffling functionality. To enable a meaningful comparison, we adapt and unify existing security definitions into a consistent set of properties. We also present an overview of privacy-preserving technologies that rely on secure shufflers, offer practical guidelines for selecting appropriate protocols, and outline promising directions for future work.
CRApr 10, 2025
Privacy-Preserving Vertical K-Means ClusteringFederico Mazzone, Trevor Brown, Florian Kerschbaum et al.
Clustering is a fundamental data processing task used for grouping records based on one or more features. In the vertically partitioned setting, data is distributed among entities, with each holding only a subset of those features. A key challenge in this scenario is that computing distances between records requires access to all distributed features, which may be privacy-sensitive and cannot be directly shared with other parties. The goal is to compute the joint clusters while preserving the privacy of each entity's dataset. Existing solutions using secret sharing or garbled circuits implement privacy-preserving variants of Lloyd's algorithm but incur high communication costs, scaling as O(nkt), where n is the number of data points, k the number of clusters, and t the number of rounds. These methods become impractical for large datasets or several parties, limiting their use to LAN settings only. On the other hand, a different line of solutions rely on differential privacy (DP) to outsource the local features of the parties to a central server. However, they often significantly degrade the utility of the clustering outcome due to excessive noise. In this work, we propose a novel solution based on homomorphic encryption and DP, reducing communication complexity to O(n+kt). In our method, parties securely outsource their features once, allowing a computing party to perform clustering operations under encryption. DP is applied only to the clusters' centroids, ensuring privacy with minimal impact on utility. Our solution clusters 100,000 two-dimensional points into five clusters using only 73MB of communication, compared to 101GB for existing works, and completes in just under 3 minutes on a 100Mbps network, whereas existing works take over 1 day. This makes our solution practical even for WAN deployments, all while maintaining accuracy comparable to plaintext k-means algorithms.
CRJan 26, 2021
Biometric Verification Secure Against Malicious AdversariesAmina Bassit, Florian Hahn, Joep Peeters et al.
Biometric verification has been widely deployed in current authentication solutions as it proves the physical presence of individuals. To protect the sensitive biometric data in such systems, several solutions have been developed that provide security against honest-but-curious (semi-honest) attackers. However, in practice attackers typically do not act honestly and multiple studies have shown drastic biometric information leakage in such honest-but-curious solutions when considering dishonest, malicious attackers. In this paper, we propose a provably secure biometric verification protocol to withstand malicious attackers and prevent biometric data from any sort of leakage. The proposed protocol is based on a homomorphically encrypted log likelihood-ratio-based (HELR) classifier that supports any biometric modality (e.g. face, fingerprint, dynamic signature, etc.) encoded as a fixed-length real-valued feature vector and performs an accurate and fast biometric recognition. Our protocol, that is secure against malicious adversaries, is designed from a protocol secure against semi-honest adversaries enhanced by zero-knowledge proofs. We evaluate both protocols for various security levels and record a sub-second speed (between $0.37$s and $0.88$s) for the protocol against semi-honest adversaries and between $0.95$s and $2.50$s for the protocol secure against malicious adversaries.
CRMar 14, 2017
HardIDX: Practical and Secure Index with SGXBenny Fuhry, Raad Bahmani, Ferdinand Brasser et al.
Software-based approaches for search over encrypted data are still either challenged by lack of proper, low-leakage encryption or slow performance. Existing hardware-based approaches do not scale well due to hardware limitations and software designs that are not specifically tailored to the hardware architecture, and are rarely well analyzed for their security (e.g., the impact of side channels). Additionally, existing hardware-based solutions often have a large code footprint in the trusted environment susceptible to software compromises. In this paper we present HardIDX: a hardware-based approach, leveraging Intel's SGX, for search over encrypted data. It implements only the security critical core, i.e., the search functionality, in the trusted environment and resorts to untrusted software for the remainder. HardIDX is deployable as a highly performant encrypted database index: it is logarithmic in the size of the index and searches are performed within a few milliseconds rather than seconds. We formally model and prove the security of our scheme showing that its leakage is equivalent to the best known searchable encryption schemes. Our implementation has a very small code and memory footprint yet still scales to virtually unlimited search index sizes, i.e., size is limited only by the general - non-secure - hardware resources.