CR AIMar 10, 2024

FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning

Zhuo Zhang, Jingyuan Zhang, Jintao Huang, Lizhen Qu, Hongzhi Zhang, Qifan Wang, Xun Zhou, Zenglin Xu

arXiv:2403.06131v28.54 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses privacy and data limitations in federated learning for instruction tuning, but it is incremental as it builds on existing federated and few-shot methods.

The paper tackles the problem of privacy risks and data scarcity in federated instruction tuning for large language models by proposing FewFedPIT, which uses synthetic data generation, parameter isolation, and local aggregation to enhance privacy and performance, achieving improved results on three datasets.

Instruction tuning has been identified as a crucial technique for optimizing the performance of large language models (LLMs) in generating human-aligned responses. Nonetheless, gathering diversified and superior-quality instruction data for such tuning presents notable obstacles, especially in domains with rigid privacy provisions. Federated instruction tuning (FedIT) has emerged as a promising solution, by consolidating collaborative training across multiple data owners, thereby resulting in a privacy-preserving learning model. However, FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks. In this paper, we propose a novel federated algorithm, FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning. FewFedPITcomprises three vital components on the client side: (1) synthetic data generation, which utilizes LLMs' in-context learning capacity to generate synthetic data autonomously, thus expanding the local database; (2) parameter isolation training, which individually updates the public parameters in the synthetic data and the private parameters in the local data, consequently mitigating the noise impact of the synthetic data; (3) local aggregation sharing, which mixes public and private parameters before uploading, effectively preventing data extraction attacks. Extensive experiments on three open-source datasets demonstrate the effectiveness of FewFedPITin, enhancing privacy preservation and improving federated few-shot performance.

View on arXiv PDF

Similar