CLJun 4, 2024

JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models

arXiv:2406.02050v48 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This addresses social bias issues for users of Japanese LLMs, but it is incremental as it adapts an existing benchmark to a new language.

The study tackled the problem of social biases in Japanese large language models (LLMs) by constructing the JBBQ dataset based on an English benchmark, finding that while larger models improve accuracy, their bias scores increase, and prompts like warnings and chain-of-thought reduce biases but have limitations in evidence extraction.

With the development of large language models (LLMs), social biases in these LLMs have become a pressing issue. Although there are various benchmarks for social biases across languages, the extent to which Japanese LLMs exhibit social biases has not been fully investigated. In this study, we construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ, with analysis of social biases in Japanese LLMs. The results show that while current open Japanese LLMs with more parameters show improved accuracies on JBBQ, their bias scores increase. In addition, prompts with a warning about social biases and chain-of-thought prompting reduce the effect of biases in model outputs, but there is room for improvement in extracting the correct evidence from contexts in Japanese. Our dataset is available at https://github.com/ynklab/JBBQ_data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes