CL LGApr 24

Where Should LoRA Go? Component-Type Placement in Hybrid Language Models

Hector Borobia, Elies Seguí-Mas, Guillermina Tormo-Carbó

arXiv:2604.2212786.1

AI Analysis

For practitioners fine-tuning hybrid LLMs, this work reveals that component-type LoRA placement is a critical design choice, with attention-only adaptation being surprisingly effective.

This paper studies LoRA placement in hybrid language models (attention + recurrent components) and finds that adapting only the attention pathway outperforms full-model adaptation with 5-10x fewer parameters, while adapting the recurrent backbone is destructive in sequential hybrids (-14.8 pp on GSM8K) but constructive in parallel ones (+8.6 pp).

Hybrid language models that interleave attention with recurrent components are increasingly competitive with pure Transformers, yet standard LoRA practice applies adapters uniformly without considering the distinct functional roles of each component type. We systematically study component-type LoRA placement across two hybrid architectures -- Qwen3.5-0.8B (sequential, GatedDeltaNet + softmax attention) and Falcon-H1-0.5B (parallel, Mamba-2 SSM + attention) -- fine-tuned on three domains and evaluated on five benchmarks. We find that the attention pathway -- despite being the minority component -- consistently outperforms full-model adaptation with 5-10x fewer trainable parameters. Crucially, adapting the recurrent backbone is destructive in sequential hybrids (-14.8 pp on GSM8K) but constructive in parallel ones (+8.6 pp). We further document a transfer asymmetry: parallel hybrids exhibit positive cross-task transfer while sequential hybrids suffer catastrophic forgetting. These results establish that hybrid topology fundamentally determines adaptation response, and that component-aware LoRA placement is a necessary design dimension for hybrid architectures.

View on arXiv PDF

Similar