CLApr 15, 2022

Identifying and Measuring Token-Level Sentiment Bias in Pre-trained Language Models with Prompts

arXiv:2204.07289v16 citationsh-index: 14
Originality Incremental advance
AI Analysis

This work addresses the issue of understanding and measuring bias in black-box language models for researchers and practitioners in AI ethics, though it is incremental as it builds on prompt tuning methods.

The authors tackled the problem of detecting sentiment bias in pre-trained language models by proposing two token-level tests, SAT and SST, using prompts as probes, and found that these tests can identify and quantify bias, with fine-tuning potentially increasing existing bias.

Due to the superior performance, large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society. However, we still lack effective tools to understand the potential bias embedded in the black-box models. Recent advances in prompt tuning show the possibility to explore the internal mechanism of the PLMs. In this work, we propose two token-level sentiment tests: Sentiment Association Test (SAT) and Sentiment Shift Test (SST) which utilize the prompt as a probe to detect the latent bias in the PLMs. Our experiments on the collection of sentiment datasets show that both SAT and SST can identify sentiment bias in PLMs and SST is able to quantify the bias. The results also suggest that fine-tuning can possibly augment the existing bias in PLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes