CLAIOct 23, 2022

Language Model Pre-Training with Sparse Latent Typing

Microsoft
arXiv:2210.12582v2292 citationsh-index: 28Has Code
Originality Highly original
AI Analysis

This work addresses the problem of interpretability and performance in language models for researchers and practitioners in NLP, offering a novel pre-training approach with demonstrated gains.

The paper tackles the lack of interpretable latent representations in language model pre-training by introducing Sparse Latent Typing, a new objective that enables models to sparsely extract sentence-level keywords with diverse latent types, resulting in significant improvements in Information Extraction tasks in supervised and few-shot settings.

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes