Differentially Private Preference Data Synthesis for Large Language Model Alignment

arXiv:2605.3080891.7h-index: 6Has Code

AI Analysis

This work provides a privacy-preserving method for LLM alignment, which is crucial for developers and users concerned about sensitive user prompts and human judgments in preference datasets. It is the first work to generate DP synthetic preference data for LLM alignment.

The paper introduces DPPrefSyn, a novel algorithm that generates differentially private (DP) synthetic preference data for aligning Large Language Models (LLMs) with human values. This method addresses privacy concerns by learning an underlying preference model from private data with DP guarantees and then synthesizing high-quality preference data using this model and public prompts. The experimental results show that DPPrefSyn achieves competitive alignment performance while maintaining strong DP guarantees.

Preference alignment is a crucial post-training step for large language models (LLMs) to ensure their outputs align with human values. However, post-training on real human preference data raises privacy concerns, as these datasets often contain sensitive user prompts and human judgments. To address this, we propose DPPrefSyn, a novel algorithm for generating differentially private (DP) synthetic preference data to enable privacy-preserving preference alignment. DPPrefSyn is a principled framework grounded in the Bradley-Terry preference model and the intrinsic geometric structure of pairwise human preference data. It first learns an underlying preference model from private data with formal differential privacy guarantees, and then leverages the learned model together with public prompts to synthesize high-quality preference data. It exploits the shared linear structure of per-cluster reward models to effectively capture heterogeneous human preferences in private datasets, and leverages DP Principal Component Analysis (DP-PCA) to improve learning accuracy. Extensive experimental results demonstrate that DPPrefSyn achieves competitive alignment performance under strong DP guarantees. These findings highlight the potential of synthetic preference data as a practical alternative for privacy-preserving preference alignment across a broad range of applications. To the best of our knowledge, this is the first work to generate DP synthetic preference data for LLM alignment. Our code is available at https://github.com/gfengyu/Differentially-Private-Preference-Data-Synthesis.

View on arXiv PDF Code

Similar