CLJun 22, 2021

A Comprehensive Comparison of Pre-training Language Models

arXiv:2106.11483v93 citations
Originality Synthesis-oriented
AI Analysis

This work provides an incremental comparison for NLP researchers, highlighting limited gains from model variations and suggesting data-centric approaches.

The paper compared pre-trained transformer-based language models under controlled conditions and found that adding an RNN layer to BERT improved short text understanding, but similar BERT structures showed no remarkable improvements, with data-centric methods performing better.

Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But the conclusion is: There are no remarkable improvement for short text understanding for similar BERT structures. Data-centric method[12] can achieve better performance.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes