LDC-MTL: Balancing Multi-Task Learning through Scalable Loss Discrepancy Control
This addresses a scalability bottleneck for researchers and practitioners using multi-task learning, though it is incremental as it builds on existing gradient manipulation methods.
The paper tackles the computational inefficiency of gradient manipulation methods in multi-task learning by proposing LDC-MTL, a scalable loss discrepancy control approach that reduces time and memory overhead from O(K) to O(1) while achieving superior accuracy and efficiency in experiments.
Multi-task learning (MTL) has been widely adopted for its ability to simultaneously learn multiple tasks. While existing gradient manipulation methods often yield more balanced solutions than simple scalarization-based approaches, they typically incur a significant computational overhead of $\mathcal{O}(K)$ in both time and memory, where $K$ is the number of tasks. In this paper, we propose LDC-MTL, a simple and scalable loss discrepancy control approach for MTL, formulated from a bilevel optimization perspective. Our method incorporates two key components: (i) a bilevel formulation for fine-grained loss discrepancy control, and (ii) a scalable first-order bilevel algorithm that requires only $\mathcal{O}(1)$ time and memory. Theoretically, we prove that LDC-MTL guarantees convergence not only to a stationary point of the bilevel problem with loss discrepancy control but also to an $ε$-accurate Pareto stationary point for all $K$ loss functions under mild conditions. Extensive experiments on diverse multi-task datasets demonstrate the superior performance of LDC-MTL in both accuracy and efficiency.