CLOct 23, 2020

GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

arXiv:2010.12532v1
Originality Incremental advance
AI Analysis

This addresses the problem of limited linguistic understanding in pre-trained models for NLP practitioners, though it is incremental as it builds on existing BERT architecture.

The paper tackles the lack of explicit linguistic knowledge in BERT by proposing a method to inject word embeddings into its layers, resulting in performance improvements on semantic similarity datasets, with qualitative analysis showing benefits for synonym pairs.

Large pre-trained language models such as BERT have been the driving force behind recent improvements across many NLP tasks. However, BERT is only trained to predict missing words - either behind masks or in the next sentence - and has no knowledge of lexical, syntactic or semantic information beyond what it picks up through unsupervised pre-training. We propose a novel method to explicitly inject linguistic knowledge in the form of word embeddings into any layer of a pre-trained BERT. Our performance improvements on multiple semantic similarity datasets when injecting dependency-based and counter-fitted embeddings indicate that such information is beneficial and currently missing from the original model. Our qualitative analysis shows that counter-fitted embedding injection particularly helps with cases involving synonym pairs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes