CLMar 14, 2022

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

arXiv:2203.07228v1661 citationsh-index: 83
Originality Synthesis-oriented
AI Analysis

This addresses fairness evaluation for legal NLP practitioners, but is incremental as it builds on existing benchmarks and methods.

The authors tackled the problem of evaluating fairness in legal text processing by creating a multilingual benchmark covering four jurisdictions and five languages, and found that existing group-robust fine-tuning techniques often fail to mitigate performance disparities across groups.

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Switzerland, and China), five languages (English, German, French, Italian and Chinese) and fairness across five attributes (gender, age, region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that performance group disparities are vibrant in many cases, while none of these techniques guarantee fairness, nor consistently mitigate group disparities. Furthermore, we provide a quantitative and qualitative analysis of our results, highlighting open challenges in the development of robustness methods in legal NLP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes