CVSep 6, 2022

Sequential Cross Attention Based Multi-task Learning

arXiv:2209.02518v16 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses multi-task learning for visual scene understanding, offering a novel architecture that improves performance, though it appears incremental in its approach.

The paper tackles the problem of transferring useful information between multiple tasks in visual scene understanding with minimal interference, achieving state-of-the-art performance on the NYUD-v2 and PASCAL-Context datasets.

In multi-task learning (MTL) for visual scene understanding, it is crucial to transfer useful information between multiple tasks with minimal interferences. In this paper, we propose a novel architecture that effectively transfers informative features by applying the attention mechanism to the multi-scale features of the tasks. Since applying the attention module directly to all possible features in terms of scale and task requires a high complexity, we propose to apply the attention module sequentially for the task and scale. The cross-task attention module (CTAM) is first applied to facilitate the exchange of relevant information between the multiple task features of the same scale. The cross-scale attention module (CSAM) then aggregates useful information from feature maps at different resolutions in the same task. Also, we attempt to capture long range dependencies through the self-attention module in the feature extraction network. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the NYUD-v2 and PASCAL-Context dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes