LGMay 12

Beyond Parameter Aggregation: Semantic Consensus for Federated Fine-Tuning of LLMs

Amr Abourayya, Jens Kleesiek, Michael Kamp

arXiv:2605.1185716.3

Predicted impact top 22% in LG · last 90 daysOriginality Highly original

AI Analysis

For practitioners deploying heterogeneous LLMs in federated settings, this work offers a communication-efficient alternative that bypasses architectural constraints and white-box access requirements.

This paper proposes a federated fine-tuning method for LLMs that replaces parameter aggregation with semantic consensus on model outputs, reducing communication by orders of magnitude (e.g., 1006x for Llama3.1-405B) while matching strong baselines.

Federated fine-tuning of large language models is commonly formulated as a parameter aggregation problem. However, even parameter-efficient methods require transmitting large collections of trainable weights, assume aligned architectures, and rely on white-box access to model parameters. As model sizes continue to grow and deployments become increasingly heterogeneous, these assumptions become progressively misaligned with practical constraints. We consider an alternative formulation in which collaboration is mediated through model behavior rather than parameters. Clients fine-tune local models on private data and exchange generated outputs on a shared, public prompt set. The server maps these outputs into a semantic representation space, forms a per-prompt semantic consensus, and returns pseudo-labels for further local fine-tuning. This formulation fundamentally changes the communication scaling of federated LLM fine-tuning. The amount of information exchanged depends only on the public prompt budget and the size of the communicated behaviors, independent of model size. As a consequence, the protocol naturally accommodates heterogeneous architectures and applies directly to open-ended text generation. We present a theoretical analysis and empirical results demonstrating that this approach can match strong federated fine-tuning baselines while substantially reducing communication by orders of magnitude (e.g., analytically by a factor of $1006$ for Llama3.1-405B), as well as reductions in runtime and energy consumption. These results suggest that, for generative foundation models, behavior-level consensus provides a more appropriate abstraction for federated adaptation than parameter aggregation.

View on arXiv PDF

Similar