CVLGSep 5, 2019

Auxiliary Learning for Deep Multi-task Learning

arXiv:1909.02214v213 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance challenges in multi-task learning for computer vision applications, though it is incremental as it builds on existing parameter-sharing methods.

The paper tackles the optimization difficulty in hard parameter sharing and the scalability issue in soft parameter sharing for multi-task learning by proposing an auxiliary module that mimics soft sharing to assist training, achieving improved performance across pixel-wise prediction tasks like semantic segmentation, depth estimation, and surface normal prediction.

Multi-task learning (MTL) is an efficient solution to solve multiple tasks simultaneously in order to get better speed and performance than handling each single-task in turn. The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized. Both methods suffer from limitations: the shared hidden layers of the former are difficult to optimize due to the competing objectives while the complexity of the latter grows linearly with the increasing number of tasks. To mitigate those drawbacks, this paper proposes an alternative, where we explicitly construct an auxiliary module to mimic the soft parameter sharing for assisting the optimization of the hard parameter sharing layers in the training phase. In particular, the auxiliary module takes the outputs of the shared hidden layers as inputs and is supervised by the auxiliary task loss. During training, the auxiliary module is jointly optimized with the MTL network, serving as a regularization by introducing an inductive bias to the shared layers. In the testing phase, only the original MTL network is kept. Thus our method avoids the limitation of both categories. We evaluate the proposed auxiliary module on pixel-wise prediction tasks, including semantic segmentation, depth estimation, and surface normal prediction with different network structures. The extensive experiments over various settings verify the effectiveness of our methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes