CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training
This work provides a more comprehensive evaluation of forgetting in LLMs for practitioners using third-party pre-trained models, but the findings are largely empirical and incremental.
The paper introduces CapTrack, a framework for evaluating forgetting in LLM post-training beyond parametric knowledge, and finds that forgetting causes systematic drift in robustness and default behaviors, with instruction fine-tuning inducing the strongest drift while preference optimization is more conservative.
Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is known to induce forgetting, especially in the ubiquitous use-case of leveraging third-party pre-trained models, which is typically understood as a loss of parametric or factual knowledge. We argue that this accuracy-centric view is insufficient for modern foundation models and instead define forgetting as systematic model drift that degrades behavior and user experience. In this context, we introduce CapTrack, a capability-centric framework for analyzing forgetting in LLMs that combines a behavioral taxonomy with an evaluation suite centered on capability-specific metrics. Using CapTrack, we conduct a large-scale empirical study across post-training algorithms, domains, and model families, including models up to 80B parameters. We find that forgetting extends beyond parametric knowledge, with pronounced drift in robustness and default behaviors. Instruction fine-tuning induces the strongest relative drift, while preference optimization is more conservative and can partially recover lost capabilities. Differences across model families persist, and no universal mitigation emerges.