CLNov 2, 2020

Influence Patterns for Explaining Information Flow in BERT

Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

arXiv:2011.00740v32.221 citations

Originality Incremental advance

AI Analysis

This work addresses the interpretability challenge for researchers and practitioners using transformer models like BERT, though it is incremental as it builds on existing attention-based and layer-based methods.

The authors tackled the problem of explaining information flow in BERT by introducing influence patterns, which quantify and localize paths through the model, and found that a significant portion of information flows through skip connections rather than attention heads, with patterns accounting for more model performance than previous methods.

While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear. We introduce influence patterns, abstractions of sets of paths through a transformer model. Patterns quantify and localize the flow of information to paths passing through a sequence of model nodes. Experimentally, we find that significant portion of information flow in BERT goes through skip connections instead of attention heads. We further show that consistency of patterns across instances is an indicator of BERT's performance. Finally, We demonstrate that patterns account for far more model performance than previous attention-based and layer-based methods.

View on arXiv PDF

Similar