CVJul 28, 2023

Prompt Guided Transformer for Multi-Task Dense Prediction

arXiv:2307.15362v137 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the problem of balancing performance and model size for researchers and practitioners in computer vision, though it is incremental as it builds on existing task-conditional and multi-decoder methods.

The paper tackles the trade-off between performance and parameter efficiency in multi-task dense prediction by introducing Prompt Guided Transformer (PGT), which achieves state-of-the-art results among task-conditional methods on benchmarks like PASCAL-Context and NYUD-v2 while using fewer parameters, with the decoder accounting for only 2.7% of total parameters.

Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes