Adaptive Scheduling for Multi-Task Learning
This work addresses performance trade-offs in multi-task learning for machine translation, particularly benefiting low-resource languages, but it is incremental as it builds on existing scheduling techniques.
The paper tackled the problem of training neural machine translation models on multiple languages by exploring adaptive task scheduling methods to improve performance on low-resource language pairs while minimizing negative effects on high-resource tasks, resulting in better multilingual models.
To train neural machine translation models simultaneously on multiple tasks (languages), it is common to sample each task uniformly or in proportion to dataset sizes. As these methods offer little control over performance trade-offs, we explore different task scheduling approaches. We first consider existing non-adaptive techniques, then move on to adaptive schedules that over-sample tasks with poorer results compared to their respective baseline. As explicit schedules can be inefficient, especially if one task is highly over-sampled, we also consider implicit schedules, learning to scale learning rates or gradients of individual tasks instead. These techniques allow training multilingual models that perform better for low-resource language pairs (tasks with small amount of data), while minimizing negative effects on high-resource tasks.