LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning
This work addresses the need for standardized evaluation in legal AI, but it is incremental as it prototypes a benchmark rather than solving the reasoning problem directly.
The authors tackled the problem of evaluating foundation models on legal reasoning tasks by proposing LegalBench, a collaborative benchmark built using the IRAC framework from legal scholarship, and they presented an initial seed set of 44 tasks to guide future development.
Can foundation models be guided to execute tasks involving legal reasoning? We believe that building a benchmark to answer this question will require sustained collaborative efforts between the computer science and legal communities. To that end, this short paper serves three purposes. First, we describe how IRAC-a framework legal scholars use to distinguish different types of legal reasoning-can guide the construction of a Foundation Model oriented benchmark. Second, we present a seed set of 44 tasks built according to this framework. We discuss initial findings, and highlight directions for new tasks. Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks. This work is ongoing, and our progress can be tracked here: https://github.com/HazyResearch/legalbench.