CL LGApr 13, 2021

Structural analysis of an all-purpose question answering model

Vincent Micheli, Quentin Heinrich, François Fleuret, Wacim Belblidia

arXiv:2104.06045v10.72 citations

Originality Incremental advance

AI Analysis

This provides insights into multi-task learning mechanisms for NLP practitioners, but it is incremental as it builds on existing Transformer analysis.

The researchers investigated how an all-purpose question answering model maintains single-task performance without strong transfer effects, finding that attention heads specialize by task and vary in learning conduciveness.

Attention is a key component of the now ubiquitous pre-trained language models. By learning to focus on relevant pieces of information, these Transformer-based architectures have proven capable of tackling several tasks at once and sometimes even surpass their single-task counterparts. To better understand this phenomenon, we conduct a structural analysis of a new all-purpose question answering model that we introduce. Surprisingly, this model retains single-task performance even in the absence of a strong transfer effect between tasks. Through attention head importance scoring, we observe that attention heads specialize in a particular task and that some heads are more conducive to learning than others in both the multi-task and single-task settings.

View on arXiv PDF

Similar