CLMar 3, 2025

Superficial Self-Improved Reasoners Benefit from Model Merging

arXiv:2503.02103v217 citationsh-index: 30EMNLP
Originality Incremental advance
AI Analysis

This addresses a fundamental challenge in self-improving AI systems for researchers and practitioners, though it is incremental as it builds on prior work on model collapse.

The paper tackled the problem of superficial self-improvement in language models, where in-domain reasoning accuracy improves but out-of-domain generalization declines due to memorization, and proposed Iterative Model Merging to mitigate this issue, achieving stable self-improving systems.

As scaled language models (LMs) approach human-level reasoning capabilities, self-improvement emerges as a solution to synthesizing high-quality data corpus. While previous research has identified model collapse as a risk in self-improvement, where model outputs become increasingly deterministic, we discover a more fundamental challenge: the superficial self-improved reasoners phenomenon. In particular, our analysis reveals that even when LMs show improved in-domain (ID) reasoning accuracy, they actually compromise their generalized reasoning capabilities on out-of-domain (OOD) tasks due to memorization rather than genuine. Through a systematic investigation of LM architecture, we discover that during self-improvement, LM weight updates are concentrated in less reasoning-critical layers, leading to superficial learning. To address this, we propose Iterative Model Merging (IMM), a method that strategically combines weights from original and self-improved models to preserve generalization while incorporating genuine reasoning improvements. Our approach effectively mitigates both LM collapse and superficial learning, moving towards more stable self-improving systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes