LG AISep 21, 2021

Improved optimization strategies for deep Multi-Task Networks

Lucas Pascal, Pietro Michiardi, Xavier Bost, Benoit Huet, Maria A. Zuluaga

arXiv:2109.11678v35.56 citations

Originality Incremental advance

AI Analysis

This work addresses optimization challenges in multi-task learning for computer vision, offering a trade-off between performance and computational efficiency, but it is incremental as it builds on existing methods.

The paper tackled the problem of optimizing multi-task networks by proposing an alternative to weighted average objectives, using alternating independent gradient descent and random task grouping, resulting in better overall performance on three visual MTL datasets with improved exploration of shared parameter space.

In Multi-Task Learning (MTL), it is a common practice to train multi-task networks by optimizing an objective function, which is a weighted average of the task-specific objective functions. Although the computational advantages of this strategy are clear, the complexity of the resulting loss landscape has not been studied in the literature. Arguably, its optimization may be more difficult than a separate optimization of the constituting task-specific objectives. In this work, we investigate the benefits of such an alternative, by alternating independent gradient descent steps on the different task-specific objective functions and we formulate a novel way to combine this approach with state-of-the-art optimizers. As the separation of task-specific objectives comes at the cost of increased computational time, we propose a random task grouping as a trade-off between better optimization and computational efficiency. Experimental results over three well-known visual MTL datasets show better overall absolute performance on losses and standard metrics compared to an averaged objective function and other state-of-the-art MTL methods. In particular, our method shows the most benefits when dealing with tasks of different nature and it enables a wider exploration of the shared parameter space. We also show that our random grouping strategy allows to trade-off between these benefits and computational efficiency.

View on arXiv PDF

Similar