CLDec 9, 2022

From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader

CMU
arXiv:2212.04755v35 citationsh-index: 51
Originality Incremental advance
AI Analysis

This addresses the discrepancy between pre-training and fine-tuning for MLMs, benefiting NLP practitioners by providing a unified model for extraction and classification tasks, though it is incremental as it builds on existing pre-trained models.

The paper tackles the problem of retrofitting pre-trained masked language models to machine reading comprehension models without labeled data, resulting in PMR, which shows tremendous improvements in low-resource scenarios and enables high-quality rationale extraction for explainability.

We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. PMR can resolve the discrepancy between model pre-training and downstream fine-tuning of existing MLMs. To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data by using Wikipedia hyperlinks and designed a Wiki Anchor Extraction task to guide the MRC-style pre-training. Apart from its simplicity, PMR effectively solves extraction tasks, such as Extractive Question Answering and Named Entity Recognition. PMR shows tremendous improvements over existing approaches, especially in low-resource scenarios. When applied to the sequence classification task in the MRC formulation, PMR enables the extraction of high-quality rationales to explain the classification process, thereby providing greater prediction explainability. PMR also has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes