UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation
This work addresses the challenge of adaptable representation learning for spatially-structured prediction tasks under domain shifts, offering a solution for applications like autonomous driving and robotics, though it is incremental in combining existing techniques.
The paper tackles the problem of limited generalization in unsupervised domain adaptation for multi-task learning by proposing UM-Adapt, a unified framework that uses adversarial cross-task distillation and contour-based regularization. It achieves state-of-the-art transfer learning results on ImageNet classification and comparable performance on PASCAL VOC 2007 detection with a smaller backbone, while outperforming fully-supervised methods on NYUD and Cityscapes datasets.
Aiming towards human-level generalization, there is a need to explore adaptable representation learning methods with greater transferability. Most existing approaches independently address task-transferability and cross-domain adaptation, resulting in limited generalization. In this paper, we propose UM-Adapt - a unified framework to effectively perform unsupervised domain adaptation for spatially-structured prediction tasks, simultaneously maintaining a balanced performance across individual tasks in a multi-task setting. To realize this, we propose two novel regularization strategies; a) Contour-based content regularization (CCR) and b) exploitation of inter-task coherency using a cross-task distillation module. Furthermore, avoiding a conventional ad-hoc domain discriminator, we re-utilize the cross-task distillation loss as output of an energy function to adversarially minimize the input domain discrepancy. Through extensive experiments, we demonstrate superior generalizability of the learned representations simultaneously for multiple tasks under domain-shifts from synthetic to natural environments. UM-Adapt yields state-of-the-art transfer learning results on ImageNet classification and comparable performance on PASCAL VOC 2007 detection task, even with a smaller backbone-net. Moreover, the resulting semi-supervised framework outperforms the current fully-supervised multi-task learning state-of-the-art on both NYUD and Cityscapes dataset.