Debiasing Multilingual LLMs in Cross-lingual Latent Space
This addresses the issue of bias reduction in multilingual AI systems, which is incremental as it builds on existing debiasing methods by enhancing their cross-lingual applicability.
The paper tackled the problem of limited cross-lingual transferability of debiasing techniques in multilingual LLMs by proposing to perform debiasing in a joint latent space rather than directly on LLM representations, resulting in significantly improved debiasing performance and transferability across four languages.
Debiasing techniques such as SentDebias aim to reduce bias in large language models (LLMs). Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. We construct a well-aligned cross-lingual latent space using an autoencoder trained on parallel TED talk scripts. Our experiments with Aya-expanse and two debiasing techniques across four languages (English, French, German, Dutch) demonstrate that a) autoencoders effectively construct a well-aligned cross-lingual latent space, and b) applying debiasing techniques in the learned cross-lingual latent space significantly improves both the overall debiasing performance and cross-lingual transferability.