CLLGMay 25, 2022

Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts

arXiv:2205.12701v2295 citationsh-index: 42
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient multi-tasking in NLP models for researchers and practitioners, offering a more flexible approach inspired by human cognition, though it is incremental as it builds on existing mixture-of-expert methods.

The paper tackled the limitation of multi-task transformer models using the same parameters for all tasks by proposing task-level mixture-of-expert models with dynamic routing, resulting in a 2.6% improvement in average performance gain for few-shot adaptation and 5.6% for zero-shot generalization.

Recent works suggest that transformer models are capable of multi-tasking on diverse NLP tasks and adapting to new tasks efficiently. However, the potential of these multi-task models may be limited as they use the same set of parameters for all tasks. In contrast, humans tackle tasks in a more flexible way, by making proper presumptions on what skills and knowledge are relevant and executing only the necessary computations. Inspired by this, we propose to use task-level mixture-of-expert models, which has a collection of transformer layers (i.e., experts) and a router component that chooses from these experts dynamically and flexibly. We find that these models help improve the average performance gain (ARG) metric by 2.6% when adapting to unseen tasks in the few-shot setting and by 5.6% in the zero-shot generalization setting. Further, we show that the learned routing decisions partly rediscover human categorization of NLP tasks -- certain experts are strongly associated with extractive tasks, some with classification tasks, and some with tasks requiring world knowledge.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes