Task-Based MoE for Multitask Multilingual Machine Translation
This work addresses the challenge of improving multitask and multilingual machine translation for AI and NLP researchers, representing an incremental advancement by adapting existing MoE architectures with task-specific components.
The paper tackled the problem of task-agnostic mixture-of-experts (MoE) models by designing a novel method that incorporates task information at different granular levels with shared dynamic task-based adapters, resulting in advantages over dense and canonical MoE models on multi-task multilingual machine translation and efficient generalization to new tasks.
Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.