JointDistill: Adaptive Multi-Task Distillation for Joint Depth Estimation and Scene Segmentation
This work addresses storage and training efficiency for joint modeling in intelligent transportation systems, though it appears incremental with novel components for existing distillation frameworks.
The paper tackles joint depth estimation and scene segmentation by proposing an adaptive multi-task distillation method that dynamically adjusts knowledge transfer from multiple teachers and uses a knowledge trajectory to prevent forgetting. The method achieves clear improvements over state-of-the-art solutions on datasets like Cityscapes and NYU-v2.
Depth estimation and scene segmentation are two important tasks in intelligent transportation systems. A joint modeling of these two tasks will reduce the requirement for both the storage and training efforts. This work explores how the multi-task distillation could be used to improve such unified modeling. While existing solutions transfer multiple teachers' knowledge in a static way, we propose a self-adaptive distillation method that can dynamically adjust the knowledge amount from each teacher according to the student's current learning ability. Furthermore, as multiple teachers exist, the student's gradient update direction in the distillation is more prone to be erroneous where knowledge forgetting may occur. To avoid this, we propose a knowledge trajectory to record the most essential information that a model has learnt in the past, based on which a trajectory-based distillation loss is designed to guide the student to follow the learning curve similarly in a cost-effective way. We evaluate our method on multiple benchmarking datasets including Cityscapes and NYU-v2. Compared to the state-of-the-art solutions, our method achieves a clearly improvement. The code is provided in the supplementary materials.