LG AIJan 20

Neural Organ Transplantation (NOT): Checkpoint-Based Modular Adaptation for Transformer Models

arXiv:2601.13580v1h-index: 2

Originality Incremental advance

AI Analysis

This enables privacy-preserving expertise sharing through checkpoint distribution for decoder-only transformer models, though it is incremental as it builds on existing modular adaptation concepts.

The paper tackles the problem of domain adaptation for transformer models by introducing Neural Organ Transplantation (NOT), a modular framework that extracts and trains contiguous layer subsets as reusable checkpoints, achieving an order-of-magnitude improvement in perplexity over LoRA while training faster.

We introduce Neural Organ Transplantation (NOT), a modular adaptation framework that enables trained transformer layers to function as reusable transferable checkpoints for domain adaptation. Unlike conventional fine-tuning approaches that tightly couple trained parameters to specific model instances and training data, NOT extracts contiguous layer subsets ("donor organs") from pre-trained models, trains them independently on domain-specific data, and saves them as standalone checkpoint files that can be transplanted into compatible recipient models without access to the original training data. Through experiments on three decoder-only transformer architectures spanning 124M to 20B parameters (GPT-2, TinyLlama, and GPT-OSS), we demonstrate that donor transplantation substantially outperforms existing adaptation methods, achieving an order-of-magnitude improvement in perplexity over LoRA while training significantly faster. The method exhibits position dependence, with early insertion positions yielding optimal results. Cross-domain transfer at billion-parameter scale reveals unexpected regularization benefits. These findings demonstrate that transformer middle layers can support efficient modular transfer for decoder-only architectures, enabling privacy-preserving expertise sharing through checkpoint distribution. We note that this approach is currently limited to decoder-only models; preliminary experiments on encoder-based architectures show reduced effectiveness.

View on arXiv PDF

Similar