CLSep 12, 2024

Experimenting with Legal AI Solutions: The Case of Question-Answering for Access to Justice

Jonathan Li, Rohan Bhambhoria, Samuel Dahan, Xiaodan Zhu

arXiv:2409.07713v14.85 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of providing accessible legal information for laypersons, though it is incremental as it builds on existing AI methods for a specific domain.

The paper tackled the problem of using generative AI to assist laypeople with legal questions by proposing a human-centric legal NLP pipeline and introducing the LegalQA dataset with expert-written answers and citations. The result showed that retrieval-augmented generation from 850 citations matched or outperformed internet-wide retrieval despite using vastly less data.

Generative AI models, such as the GPT and Llama series, have significant potential to assist laypeople in answering legal questions. However, little prior work focuses on the data sourcing, inference, and evaluation of these models in the context of laypersons. To this end, we propose a human-centric legal NLP pipeline, covering data sourcing, inference, and evaluation. We introduce and release a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law, corresponding answers written by legal experts, and citations for each answer. We develop an automatic evaluation protocol for this dataset, then show that retrieval-augmented generation from only 850 citations in the train set can match or outperform internet-wide retrieval, despite containing 9 orders of magnitude less data. Finally, we propose future directions for open-sourced efforts, which fall behind closed-sourced models.

View on arXiv PDF

Similar