LGJan 31, 2023

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

arXiv:2301.13465v18 citationsh-index: 13
Originality Highly original
AI Analysis

This addresses performance degeneration in multi-task learning for AI applications, representing an incremental improvement with a novel gradient manipulation technique.

The paper tackles the problem of negative transfer in multi-task learning caused by conflicting gradients, proposing GDOD, an optimization method that decomposes gradients into shared and conflict components to guide updates, resulting in improved performance over state-of-the-art methods on multiple datasets as measured by AUC and Logloss metrics.

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes