CLAICVLGJul 3, 2025

DoMIX: An Efficient Framework for Exploiting Domain Knowledge in Fine-Tuning

arXiv:2507.02302v12 citationsh-index: 2Has CodeACL
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks for researchers and practitioners in NLP, offering an incremental improvement over existing continual DAP methods.

The paper tackles the inefficiency and inflexibility of continual domain-adaptive pre-training by proposing DoMIX, which uses LoRA modules to enable efficient, order-robust training and provide task-specific pre-trained models, achieving up to 30% faster training and 40% lower memory usage compared to baselines.

Domain-Adaptive Pre-training (DAP) has recently gained attention for its effectiveness in fine-tuning pre-trained models. Building on this, continual DAP has been explored to develop pre-trained models capable of incrementally incorporating different domain datasets. However, existing continual DAP methods face several limitations: (1) high computational cost and GPU memory usage during training; (2) sensitivity to incremental data order; and (3) providing a single, generalized model for all end tasks, which contradicts the essence of DAP. In this paper, we propose DoMIX, a novel approach that addresses these challenges by leveraging LoRA modules, a representative parameter-efficient fine-tuning (PEFT) method. Our approach enables efficient and parallel domain-adaptive pre-training that is robust to domain order and effectively utilizes accumulated knowledge to provide tailored pre-trained models for specific tasks. We also demonstrate that our method can be extended beyond the DAP setting to standard LLM fine-tuning scenarios. Code is available at https://github.com/dohoonkim-ai/DoMIX.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes