ML LGSep 25, 2024

Learning Representation for Multitask learning through Self Supervised Auxiliary learning

arXiv:2409.16651v15.52 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in multi-task learning for researchers and practitioners, though it is incremental as it builds on existing hard parameter sharing methods.

The paper tackles the problem of improving representation quality in multi-task learning by proposing Dummy Gradient norm Regularization (DGR), which enhances the universality of shared encoder representations, leading to better prediction performances on benchmark datasets.

Multi-task learning is a popular machine learning approach that enables simultaneous learning of multiple related tasks, improving algorithmic efficiency and effectiveness. In the hard parameter sharing approach, an encoder shared through multiple tasks generates data representations passed to task-specific predictors. Therefore, it is crucial to have a shared encoder that provides decent representations for every and each task. However, despite recent advances in multi-task learning, the question of how to improve the quality of representations generated by the shared encoder remains open. To address this gap, we propose a novel approach called Dummy Gradient norm Regularization that aims to improve the universality of the representations generated by the shared encoder. Specifically, the method decreases the norm of the gradient of the loss function with repect to dummy task-specific predictors to improve the universality of the shared encoder's representations. Through experiments on multiple multi-task learning benchmark datasets, we demonstrate that DGR effectively improves the quality of the shared representations, leading to better multi-task prediction performances. Applied to various classifiers, the shared representations generated by DGR also show superior performance compared to existing multi-task learning methods. Moreover, our approach takes advantage of computational efficiency due to its simplicity. The simplicity also allows us to seamlessly integrate DGR with the existing multi-task learning algorithms.

View on arXiv PDF

Similar