LGAIFeb 6

Training Data Selection with Gradient Orthogonality for Efficient Domain Adaptation

arXiv:2602.06359v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the trade-off between domain expertise and general reasoning in domain adaptation for LLMs, offering an efficient solution to mitigate catastrophic forgetting.

The paper tackles the problem of catastrophic forgetting in fine-tuning large language models for specialized domains by proposing Orthogonal Gradient Selection (OGS), which dynamically selects training samples with orthogonal gradients to retain general capabilities, achieving improved domain performance and efficiency while maintaining or enhancing performance on general tasks like GSM8K.

Fine-tuning large language models (LLMs) for specialized domains often necessitates a trade-off between acquiring domain expertise and retaining general reasoning capabilities, a phenomenon known as catastrophic forgetting. Existing remedies face a dichotomy: gradient surgery methods offer geometric safety but incur prohibitive computational costs via online projections, while efficient data selection approaches reduce overhead but remain blind to conflict-inducing gradient directions. In this paper, we propose Orthogonal Gradient Selection (OGS), a data-centric method that harmonizes domain performance, general capability retention, and training efficiency. OGS shifts the geometric insights of gradient projection from the optimizer to the data selection stage by treating data selection as a constrained decision-making process. By leveraging a lightweight Navigator model and reinforcement learning techniques, OGS dynamically identifies training samples whose gradients are orthogonal to a general-knowledge anchor. This approach ensures naturally safe updates for target models without modifying the optimizer or incurring runtime projection costs. Experiments across medical, legal, and financial domains demonstrate that OGS achieves excellent results, significantly improving domain performance and training efficiency while maintaining or even enhancing performance on general tasks such as GSM8K.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes