CL CYMar 13, 2023

Are Models Trained on Indian Legal Data Fair?

Sahil Girhepuje, Anmol Goel, Gokul S Krishnan, Shreya Goyal, Satyendra Pandey, Ponnurangam Kumaraguru, Balaraman Ravindran

arXiv:2303.07247v30.53 citationsh-index: 46

Originality Synthesis-oriented

AI Analysis

This work addresses fairness issues in AI for the legal sector in India, highlighting a specific bias problem, but it is incremental as it extends existing fairness studies to a new context.

The paper investigated fairness in AI models trained on Indian legal data, specifically for bail prediction, and found a fairness disparity of 0.237 between Hindu and Muslim groups using demographic parity.

Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have been studied across NLP, most studies primarily locate themselves within a Western context. In this work, we present an initial investigation of fairness from the Indian perspective in the legal domain. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. We evaluate the fairness gap using demographic parity and show that a decision tree model trained for the bail prediction task has an overall fairness disparity of 0.237 between input features associated with Hindus and Muslims. Additionally, we highlight the need for further research and studies in the avenues of fairness/bias in applying AI in the legal sector with a specific focus on the Indian context.

View on arXiv PDF

Similar