E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT
This addresses the issue of BERT's overreliance on entity surface forms for NLP tasks, offering an efficient enhancement for researchers and practitioners in natural language processing.
The authors tackled the problem of injecting factual knowledge about entities into BERT without expensive retraining by aligning Wikipedia2Vec entity vectors with BERT's wordpiece space, resulting in E-BERT, which outperformed BERT and other baselines on unsupervised QA, supervised RC, and EL tasks.
We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to ERNIE (Zhang et al., 2019) and KnowBert (Peters et al., 2019), but it requires no expensive further pretraining of the BERT encoder. We evaluate E-BERT on unsupervised question answering (QA), supervised relation classification (RC) and entity linking (EL). On all three tasks, E-BERT outperforms BERT and other baselines. We also show quantitatively that the original BERT model is overly reliant on the surface form of entity names (e.g., guessing that someone with an Italian-sounding name speaks Italian), and that E-BERT mitigates this problem.