IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders
This provides a resource for legal NLP researchers and practitioners in India, but it is incremental as it focuses on a specific domain and dataset creation.
The authors tackled the scarcity of structured legal datasets in India by introducing IndianBailJudgments-1200, a benchmark dataset of 1200 Indian court bail judgments annotated across 20+ attributes, which supports tasks like outcome prediction and fairness analysis.
Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.