CVFeb 27, 2024

A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

arXiv:2402.17319v11 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This is an incremental solution for the multi-task robustness track in a computer vision challenge, addressing dense visual prediction tasks.

The authors tackled multi-task robustness in dense visual prediction by proposing UniNet, a vanilla framework combining DETR3D, Mask2Former, and BinsFormer for 3D detection, instance segmentation, and depth estimation, achieving a 49.6 overall score on the SHIFT validation set.

In this report, we present our solution to the multi-task robustness track of the 1st Visual Continual Learning (VCL) Challenge at ICCV 2023 Workshop. We propose a vanilla framework named UniNet that seamlessly combines various visual perception algorithms into a multi-task model. Specifically, we choose DETR3D, Mask2Former, and BinsFormer for 3D object detection, instance segmentation, and depth estimation tasks, respectively. The final submission is a single model with InternImage-L backbone, and achieves a 49.6 overall score (29.5 Det mAP, 80.3 mTPS, 46.4 Seg mAP, and 7.93 silog) on SHIFT validation set. Besides, we provide some interesting observations in our experiments which may facilitate the development of multi-task learning in dense visual prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes