CLSep 13, 2021

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

arXiv:2109.05729v4180 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and flexible pre-trained models for Chinese NLP, though it is incremental as it builds on existing pre-trained models and architectures.

The authors tackled the problem of developing a pre-trained model for both Chinese language understanding and generation by proposing CPT, an unbalanced Transformer with a shared encoder and two decoders, achieving competitive performance across a wide range of Chinese NLU and NLG tasks while reducing computational and storage costs.

In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes