CVJan 8, 2025

TADFormer : Task-Adaptive Dynamic Transformer for Efficient Multi-Task Learning

arXiv:2501.04293v25 citationsh-index: 7CVPR
AI Analysis

This addresses the problem of high computational costs in multi-task learning for vision tasks, offering an incremental improvement over existing parameter-efficient methods.

The paper tackles the computational inefficiency of full fine-tuning in multi-task learning by introducing TADFormer, a parameter-efficient fine-tuning framework that achieves higher accuracy on dense scene understanding tasks while reducing trainable parameters by up to 8.4 times compared to full fine-tuning.

Transfer learning paradigm has driven substantial advancements in various vision tasks. However, as state-of-the-art models continue to grow, classical full fine-tuning often becomes computationally impractical, particularly in multi-task learning (MTL) setup where training complexity increases proportional to the number of tasks. Consequently, recent studies have explored Parameter-Efficient Fine-Tuning (PEFT) for MTL architectures. Despite some progress, these approaches still exhibit limitations in capturing fine-grained, task-specific features that are crucial to MTL. In this paper, we introduce Task-Adaptive Dynamic transFormer, termed TADFormer, a novel PEFT framework that performs task-aware feature adaptation in the fine-grained manner by dynamically considering task-specific input contexts. TADFormer proposes the parameter-efficient prompting for task adaptation and the Dynamic Task Filter (DTF) to capture task information conditioned on input contexts. Experiments on the PASCAL-Context benchmark demonstrate that the proposed method achieves higher accuracy in dense scene understanding tasks, while reducing the number of trainable parameters by up to 8.4 times when compared to full fine-tuning of MTL models. TADFormer also demonstrates superior parameter efficiency and accuracy compared to recent PEFT methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes