CLApr 20, 2025

Efficient Knowledge Transfer in Multi-Task Learning through Task-Adaptive Low-Rank Representation

Xiao Zhang, Kangsheng Wang, Tianyu Hu, Huimin Ma

arXiv:2505.00009v14 citationsh-index: 8ICME

Originality Incremental advance

AI Analysis

This addresses the challenge of efficiently transferring knowledge to new tasks in real-world applications, offering an incremental improvement over existing prompt tuning methods.

The paper tackles the problem of multi-task learning with pre-trained language models by proposing TA-LoRA, which uses low-rank representation and a fast-slow weights mechanism to better capture task-specific knowledge, achieving state-of-the-art performance on 16 tasks in full-data and few-shot settings.

Pre-trained language models (PLMs) demonstrate remarkable intelligence but struggle with emerging tasks unseen during training in real-world applications. Training separate models for each new task is usually impractical. Multi-task learning (MTL) addresses this challenge by transferring shared knowledge from source tasks to target tasks. As an dominant parameter-efficient fine-tuning method, prompt tuning (PT) enhances MTL by introducing an adaptable vector that captures task-specific knowledge, which acts as a prefix to the original prompt that preserves shared knowledge, while keeping PLM parameters frozen. However, PT struggles to effectively capture the heterogeneity of task-specific knowledge due to its limited representational capacity. To address this challenge, we propose Task-Adaptive Low-Rank Representation (TA-LoRA), an MTL method built on PT, employing the low-rank representation to model task heterogeneity and a fast-slow weights mechanism where the slow weight encodes shared knowledge, while the fast weight captures task-specific nuances, avoiding the mixing of shared and task-specific knowledge, caused by training low-rank representations from scratch. Moreover, a zero-initialized attention mechanism is introduced to minimize the disruption of immature low-rank components on original prompts during warm-up epochs. Experiments on 16 tasks demonstrate that TA-LoRA achieves state-of-the-art performance in full-data and few-shot settings while maintaining superior parameter efficiency.

View on arXiv PDF

Similar