LG CRJun 8, 2024

Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering

arXiv:2406.05545v12.6

Originality Synthesis-oriented

AI Analysis

This addresses privacy concerns in collaborative data sharing for multiple data owners, but it is incremental as it applies existing differential privacy methods to a specific clustering context.

This study tackles the problem of selecting optimal parameters for collaborative clustering while preserving data privacy, finding that differential privacy techniques like Randomized Response can maintain high-quality clustering (measured by Adjusted Rand Index and Silhouette Score) with minimal impact on recommendations but increased risk of membership inference attacks as privacy parameter ε rises.

This study investigates the optimal selection of parameters for collaborative clustering while ensuring data privacy. We focus on key clustering algorithms within a collaborative framework, where multiple data owners combine their data. A semi-trusted server assists in recommending the most suitable clustering algorithm and its parameters. Our findings indicate that the privacy parameter ($ε$) minimally impacts the server's recommendations, but an increase in $ε$ raises the risk of membership inference attacks, where sensitive information might be inferred. To mitigate these risks, we implement differential privacy techniques, particularly the Randomized Response mechanism, to add noise and protect data privacy. Our approach demonstrates that high-quality clustering can be achieved while maintaining data confidentiality, as evidenced by metrics such as the Adjusted Rand Index and Silhouette Score. This study contributes to privacy-aware data sharing, optimal algorithm and parameter selection, and effective communication between data owners and the server.

View on arXiv PDF

Similar