CLJun 8, 2025

Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

arXiv:2506.07249v12 citationsh-index: 2Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Originality Incremental advance
AI Analysis

This work addresses bias interpretability in non-English language models, offering insights for fairness in multilingual AI, though it is incremental as it extends an existing metric.

The study adapted a bias attribution metric for agglutinative languages like Filipino, revealing that Filipino models are biased by entity-based themes such as people and objects, contrasting with action-heavy themes in English models.

Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in language models processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages, particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models: one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships, entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English models process inputs linked to sociodemographic groups and bias.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes