How BERT Speaks Shakespearean English? Evaluating Historical Bias in Contextual Language Models
This work addresses bias evaluation for researchers in NLP and historical linguistics, but it is incremental as it applies existing methods to a new dataset.
The paper tackled the problem of evaluating historical bias in BERT-based language models by measuring their adequacy for Early Modern and Modern English using fill-in-the-blank tests on 60 sentences, finding weighted scores to assess each model's performance.
In this paper, we explore the idea of analysing the historical bias of contextual language models based on BERT by measuring their adequacy with respect to Early Modern (EME) and Modern (ME) English. In our preliminary experiments, we perform fill-in-the-blank tests with 60 masked sentences (20 EME-specific, 20 ME-specific and 20 generic) and three different models (i.e., BERT Base, MacBERTh, English HLM). We then rate the model predictions according to a 5-point bipolar scale between the two language varieties and derive a weighted score to measure the adequacy of each model to EME and ME varieties of English.