CLLGMay 25, 2022

ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data

UW
arXiv:2205.12600v134 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the interpretability challenge for researchers and practitioners using large language models by providing insights into data dependencies, though it is incremental as it builds on existing methods for model analysis.

The authors tackled the problem of understanding where prompted language models acquire task-specific knowledge in zero-shot setups by locating a small subset of pretraining data that supports the model's competence, and they proposed ORCA, a method that uses gradient information to identify this evidence, showing in sentiment analysis and textual entailment tasks that BERT relies heavily on BookCorpus and examples with masked synonyms.

Large pretrained language models have been performing increasingly well in a variety of downstream tasks via prompting. However, it remains unclear from where the model learns the task-specific knowledge, especially in a zero-shot setup. In this work, we want to find evidence of the model's task-specific competence from pretraining and are specifically interested in locating a very small subset of pretraining data that directly supports the model in the task. We call such a subset supporting data evidence and propose a novel method ORCA to effectively identify it, by iteratively using gradient information related to the downstream task. This supporting data evidence offers interesting insights about the prompted language models: in the tasks of sentiment analysis and textual entailment, BERT shows a substantial reliance on BookCorpus, the smaller corpus of BERT's two pretraining corpora, as well as on pretraining examples that mask out synonyms to the task verbalizers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes