LG AI DCMar 7, 2024

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang

arXiv:2403.04529v17.96 citationsh-index: 18

Originality Incremental advance

AI Analysis

This addresses the challenge of incorporating high-quality private data without sharing it, which is incremental for scaling foundation models.

The paper tackles the problem of data quality control in federated fine-tuning of foundation models by proposing a pipeline that scores training data and sets a global threshold, resulting in improved model performance.

In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research. To further scale up, it is crucial to incorporate collaboration among multiple specialized and high-quality private domain data sources. However, the challenge of training models locally without sharing private data presents numerous obstacles in data quality control. To tackle this issue, we propose a data quality control pipeline for federated fine-tuning of foundation models. This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard, aiming for improved global performance. Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.

View on arXiv PDF

Similar