CLJul 17, 2022

ELECTRA is a Zero-Shot Learner, Too

arXiv:2207.08141v210 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of zero-shot learning for NLP practitioners by showing that ELECTRA, a previously neglected discriminative model, can outperform masked language models, though it is incremental as it adapts an existing model to a new paradigm.

The paper tackles zero-shot learning in NLP by proposing a replaced token detection (RTD)-based prompt learning method for the ELECTRA model, achieving state-of-the-art performance with an average improvement of 8.4% over RoBERTa-large and 13.7% over BERT-large across 15 tasks, including 90.1% accuracy on SST-2 without training data.

Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discriminative model, ELECTRA, has probably been neglected. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a novel our proposed replaced token detection (RTD)-based prompt learning method. Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance. Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7% improvement on all 15 tasks. Especially on the SST-2 task, our RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training data. Overall, compared to the pre-trained masked language models, the pre-trained replaced token detection model performs better in zero-shot learning. The source code is available at: https://github.com/nishiwen1214/RTD-ELECTRA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes