AIDec 18, 2025

AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints

arXiv:2512.16245v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the issue of alignment loss during model merging for practitioners, though it is incremental as it builds on existing merging techniques.

The paper tackled the problem of large language model merging destroying alignment by introducing AlignMerge, a geometry-aware framework that explicitly preserves alignment during merging, resulting in improved alignment metrics across five model families while maintaining or exceeding expert performance on instruction-following, reasoning, and helpfulness.

Merging large language models (LLMs) is a practical way to compose capabilities from multiple fine-tuned checkpoints without retraining. Yet standard schemes (linear weight soups, task vectors, and Fisher-weighted averaging) can preserve loss while quietly destroying alignment. We argue that merging is not a numerical trick but a geometry-constrained operation around an already-aligned anchor: fusion must be steered to respect safety geometry, not validated post hoc. We introduce AlignMerge, a geometry-aware merging framework that makes alignment an explicit invariant. In a local Fisher chart around an instruction-tuned base, we estimate an alignment subspace with projector P_A and optimize: L_AlignMerge = L_geo + lambda_align * L_align + lambda_bud * L_bud, where L_geo keeps the merge close to its experts in Fisher-Rao geometry, L_align penalizes motion along alignment-sensitive directions, and L_bud enforces a soft alignment budget. As the alignment functional we use the decoding-invariant Alignment Quality Index (AQI), a latent-space criterion that captures how cleanly aligned and misaligned behaviors separate in representation space. Across five model families (LLaMA-3 8B, Mistral 7B, Qwen 2, Phi-3.5, Gemma 2), merging safety anchors with task experts, AlignMerge improves alignment metrics (AQI, toxicity, LLM-judge alignment) while matching or exceeding the best expert on instruction-following, reasoning, and helpfulness. It also exhibits smaller alignment-subspace drift and fewer budget violations than Fisher soups, TIES, SafeMerge, and MergeAlign. These results make alignment-preserving merging a first-class design goal and suggest a path to geometry-aware composition of future foundation models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes