Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer
This addresses privacy concerns in fine-tuning LLMs for specific tasks, benefiting applications where data is siloed across clients.
The paper tackles the problem of improving task-specific performance of large language models (LLMs) without accessing private data by proposing CrossLM, a method that enables mutual enhancement between LLMs and smaller language models (SLMs) through cross-silo knowledge transfer, resulting in significant performance gains for both models while preserving generalization.
While large language models (LLMs) are empowered with broad knowledge, their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. In this paper, we propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data. To enable mutual enhancement between LLMs and SLMs, we propose CrossLM, where the SLMs promote the LLM to generate task-specific high-quality data, and both the LLM and SLMs are enhanced with the generated data. We evaluate CrossLM using publicly accessible language models across a range of benchmark tasks. The results demonstrate that CrossLM significantly enhances the task-specific performance of SLMs on clients and the LLM on the cloud server simultaneously while preserving the LLM's generalization capability.