AICLDec 10, 2023

Mutual Enhancement of Large and Small Language Models with Cross-Silo Knowledge Transfer

arXiv:2312.05842v117 citations
Originality Highly original
AI Analysis

This addresses privacy concerns in fine-tuning LLMs for specific tasks, benefiting applications where data is siloed across clients.

The paper tackles the problem of improving task-specific performance of large language models (LLMs) without accessing private data by proposing CrossLM, a method that enables mutual enhancement between LLMs and smaller language models (SLMs) through cross-silo knowledge transfer, resulting in significant performance gains for both models while preserving generalization.

While large language models (LLMs) are empowered with broad knowledge, their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. In this paper, we propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data. To enable mutual enhancement between LLMs and SLMs, we propose CrossLM, where the SLMs promote the LLM to generate task-specific high-quality data, and both the LLM and SLMs are enhanced with the generated data. We evaluate CrossLM using publicly accessible language models across a range of benchmark tasks. The results demonstrate that CrossLM significantly enhances the task-specific performance of SLMs on clients and the LLM on the cloud server simultaneously while preserving the LLM's generalization capability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes