CL AIMay 7

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Bing Wang, Ximing Li, Changchun Li, Jinjin Chi, Gang Niu, Masashi Sugiyama

arXiv:2605.0567694.4h-index: 2

Predicted impact top 14% in CL · last 90 daysOriginality Highly original

AI Analysis

For practitioners fine-tuning LLMs on multiple tasks, this work mitigates a key bottleneck—cross-task interference—with a novel decomposition approach.

The paper addresses cross-task interference in multi-task instruct-tuning of LLMs, where conflicting gradients degrade performance. The proposed BADIT method decomposes parameters into orthogonal LoRA experts representing basic abilities, achieving superior results over SOTA methods on the SuperNI benchmark across 6 LLMs.

Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared parameters among different tasks. Some previous methods mitigate this issue by isolating task-specific parameters, e.g., task-specific neuron selection and mixture-of-experts. In this paper, we empirically reveal that the cross-task interference still exists for the existing solutions because of many parameters also shared by different tasks, and accordingly, we propose a novel solution, namely Basic Abilities Decomposition for multi-task Instruct-Tuning (BADIT). Specifically, we empirically find that certain parameters are consistently co-activated, and that co-activated parameters naturally organize into base groups. This motivates us to analogize that LLMs encode several orthogonal basic abilities, and that any task can be represented as a linear combination of these abilities. Accordingly, we propose BADIT that decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities, and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components. We conduct extensive experiments on the SuperNI benchmark with 6 LLMs, and empirical results demonstrate that BADIT can outperform SOTA methods and mitigate the degree of cross-task interference.

View on arXiv PDF

Similar